comparison writeup/techreport.tex @ 415:1e9788ce1680

Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
author fsavard
date Thu, 29 Apr 2010 17:21:48 -0400
parents 4f69d915d142
children 5f9d04dda707
comparison
equal deleted inserted replaced
414:3dba84c0fbc1 415:1e9788ce1680
90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. 90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image.
91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied 91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied
92 to the image is low enough not to confuse classes. 92 to the image is low enough not to confuse classes.
93 93
94 \subsection{Local Elastic Deformations} 94 \subsection{Local Elastic Deformations}
95 \subsection{GIMP transformation} 95
96 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in .
97
98 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image.
99
100 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead.
101
102 To generate a pixel in either field, first a value between -1 and 1 is chosen from a uniform distribution. Then all the pixels, in both fields, is multiplied by a constant $\alpha$ which controls the intensity of the displacements (bigger $\alpha$ translates into larger wiggles).
103
104 As a final step, each field is convoluted with a Gaussian 2D kernel of standard deviation $\sigma$. Visually, this results in a "blur" filter. This has the effect of making values next to each other in the displacement fields similar. In effect, this makes the wiggles more coherent, less noisy.
105
106 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs.
107
108 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$.
109
110 \subsection{Motion Blur}
111
112 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter.
113
114 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm.
115
116 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$.
117
118 \subsection{Pinch}
119
120 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface".
121
122 Mathematically, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch}$, where $pinch$ is a parameter to the filter.
123
124 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position.
125
126 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
127
128
96 \subsection{Occlusion} 129 \subsection{Occlusion}
130
131 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
132
133 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
134
135 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion.
136
137 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right.
138
139 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x_\arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x_\arrivee$. The contrary happens if $endroit$ is $right$.
140
141 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image.
142
143 This filter has a probability of not being applied, at all, of 60%.
144
97 \subsection{Background Images} 145 \subsection{Background Images}
146
147 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders).
148
149 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$.
150
151 Once we have all these numbers, we first adjust the values for the background image. Each pixel value is multiplied by $\frac{max(maximage - contrast, 0)}{maxbg}$. Therefore the higher the contrast, the darkest the background will be.
152
153 The final image is found by taking the brightest (i.e. value nearest to 1) pixel from either the background image or the corresponding pixel in the original image.
154
98 \subsection{Salt and Pepper Noise} 155 \subsection{Salt and Pepper Noise}
156
157 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed.
158
159 This filter also has a probability of not being applied, at all, of 25\%.
160
99 \subsection{Spatially Gaussian Noise} 161 \subsection{Spatially Gaussian Noise}
100 \subsection{Color and Contrast Changes} 162 \subsection{Color and Contrast Changes}
101 163
102 \begin{figure}[h] 164 \begin{figure}[h]
103 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ 165 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\