comparison writeup/techreport.tex @ 426:a7fab59de174

change order of transformations
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Fri, 30 Apr 2010 16:29:17 -0400
parents c06a3d9b5664
children ace489930918
comparison
equal deleted inserted replaced
425:c06a3d9b5664 426:a7fab59de174
116 116
117 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs. 117 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs.
118 118
119 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$. 119 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$.
120 120
121
122 \subsection{Pinch}
123
124 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface".
125
126 Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter.
127
128 If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio.
129
130 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found.
131
132 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
133
121 \subsection{Motion Blur} 134 \subsection{Motion Blur}
122 135
123 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter. 136 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter.
124 137
125 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm. 138 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm.
126 139
127 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$. 140 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$.
128 141
129 \subsection{Pinch} 142 \subsection{Occlusion}
130 143
131 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface". 144 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
132 145
133 Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter. 146 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
134 147
135 If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio. 148 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion.
136 149
137 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. 150 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right.
138 151
139 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. 152 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$.
153
154 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image.
155
156 This filter has a probability of not being applied, at all, of 60\%.
140 157
141 158
142 \subsection{Distorsion gauss} 159 \subsection{Distorsion gauss}
160
143 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. 161 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$.
144 162
145 It has has a probability of not being applied, at all, of 70\%. 163 It has has a probability of not being applied, at all, of 70\%.
146 164
147 165
148 \subsection{Occlusion}
149
150 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
151
152 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
153
154 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion.
155
156 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right.
157
158 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$.
159
160 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image.
161
162 This filter has a probability of not being applied, at all, of 60\%.
163
164 \subsection{Background Images} 166 \subsection{Background Images}
165 167
166 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders). 168 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders).
167 169
168 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$. 170 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$.
176 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. 178 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed.
177 179
178 This filter also has a probability of not being applied, at all, of 75\%. 180 This filter also has a probability of not being applied, at all, of 75\%.
179 181
180 \subsection{Spatially Gaussian Noise} 182 \subsection{Spatially Gaussian Noise}
183
181 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time 184 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time
182 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges: 185 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges:
183 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. 186 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$.
184 We also create a symmetric averaging window, of the kernel size, with maximum value at the center. 187 We also create a symmetric averaging window, of the kernel size, with maximum value at the center.
185 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers 188 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers
189 192
190 This filter has a probability of not being applied, at all, of 75\%. 193 This filter has a probability of not being applied, at all, of 75\%.
191 194
192 195
193 \subsection{Color and Contrast Changes} 196 \subsection{Color and Contrast Changes}
197
194 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference 198 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference
195 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity}$ 199 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity$
196 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity 200 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity
197 is inverted with $0.5$ probability. 201 is inverted with $0.5$ probability.
198 202
199 203
200 \begin{figure}[h] 204 \begin{figure}[h]