Mercurial > ift6266
comparison writeup/techreport.tex @ 415:1e9788ce1680
Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images
author | fsavard |
---|---|
date | Thu, 29 Apr 2010 17:21:48 -0400 |
parents | 4f69d915d142 |
children | 5f9d04dda707 |
comparison
equal
deleted
inserted
replaced
414:3dba84c0fbc1 | 415:1e9788ce1680 |
---|---|
90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. | 90 We generate an affine transform matrix according to the complexity level, then we apply it directly to the image. |
91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied | 91 This allows to produce scaling, translation, rotation and shearing variances. We took care that the maximum rotation applied |
92 to the image is low enough not to confuse classes. | 92 to the image is low enough not to confuse classes. |
93 | 93 |
94 \subsection{Local Elastic Deformations} | 94 \subsection{Local Elastic Deformations} |
95 \subsection{GIMP transformation} | 95 |
96 This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in . | |
97 | |
98 The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. | |
99 | |
100 When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead. | |
101 | |
102 To generate a pixel in either field, first a value between -1 and 1 is chosen from a uniform distribution. Then all the pixels, in both fields, is multiplied by a constant $\alpha$ which controls the intensity of the displacements (bigger $\alpha$ translates into larger wiggles). | |
103 | |
104 As a final step, each field is convoluted with a Gaussian 2D kernel of standard deviation $\sigma$. Visually, this results in a "blur" filter. This has the effect of making values next to each other in the displacement fields similar. In effect, this makes the wiggles more coherent, less noisy. | |
105 | |
106 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs. | |
107 | |
108 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$. | |
109 | |
110 \subsection{Motion Blur} | |
111 | |
112 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter. | |
113 | |
114 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm. | |
115 | |
116 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$. | |
117 | |
118 \subsection{Pinch} | |
119 | |
120 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface". | |
121 | |
122 Mathematically, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch}$, where $pinch$ is a parameter to the filter. | |
123 | |
124 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position. | |
125 | |
126 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. | |
127 | |
128 | |
96 \subsection{Occlusion} | 129 \subsection{Occlusion} |
130 | |
131 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. | |
132 | |
133 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. | |
134 | |
135 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion. | |
136 | |
137 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right. | |
138 | |
139 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x_\arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x_\arrivee$. The contrary happens if $endroit$ is $right$. | |
140 | |
141 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image. | |
142 | |
143 This filter has a probability of not being applied, at all, of 60%. | |
144 | |
97 \subsection{Background Images} | 145 \subsection{Background Images} |
146 | |
147 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders). | |
148 | |
149 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$. | |
150 | |
151 Once we have all these numbers, we first adjust the values for the background image. Each pixel value is multiplied by $\frac{max(maximage - contrast, 0)}{maxbg}$. Therefore the higher the contrast, the darkest the background will be. | |
152 | |
153 The final image is found by taking the brightest (i.e. value nearest to 1) pixel from either the background image or the corresponding pixel in the original image. | |
154 | |
98 \subsection{Salt and Pepper Noise} | 155 \subsection{Salt and Pepper Noise} |
156 | |
157 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. | |
158 | |
159 This filter also has a probability of not being applied, at all, of 25\%. | |
160 | |
99 \subsection{Spatially Gaussian Noise} | 161 \subsection{Spatially Gaussian Noise} |
100 \subsection{Color and Contrast Changes} | 162 \subsection{Color and Contrast Changes} |
101 | 163 |
102 \begin{figure}[h] | 164 \begin{figure}[h] |
103 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ | 165 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ |