Mercurial > ift6266
comparison writeup/techreport.tex @ 426:a7fab59de174
change order of transformations
author | Xavier Glorot <glorotxa@iro.umontreal.ca> |
---|---|
date | Fri, 30 Apr 2010 16:29:17 -0400 |
parents | c06a3d9b5664 |
children | ace489930918 |
comparison
equal
deleted
inserted
replaced
425:c06a3d9b5664 | 426:a7fab59de174 |
---|---|
116 | 116 |
117 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs. | 117 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs. |
118 | 118 |
119 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$. | 119 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$. |
120 | 120 |
121 | |
122 \subsection{Pinch} | |
123 | |
124 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface". | |
125 | |
126 Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter. | |
127 | |
128 If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio. | |
129 | |
130 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. | |
131 | |
132 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. | |
133 | |
121 \subsection{Motion Blur} | 134 \subsection{Motion Blur} |
122 | 135 |
123 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter. | 136 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter. |
124 | 137 |
125 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm. | 138 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm. |
126 | 139 |
127 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$. | 140 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$. |
128 | 141 |
129 \subsection{Pinch} | 142 \subsection{Occlusion} |
130 | 143 |
131 This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface". | 144 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. |
132 | 145 |
133 Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter. | 146 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. |
134 | 147 |
135 If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio. | 148 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion. |
136 | 149 |
137 The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. | 150 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right. |
138 | 151 |
139 The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. | 152 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$. |
153 | |
154 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image. | |
155 | |
156 This filter has a probability of not being applied, at all, of 60\%. | |
140 | 157 |
141 | 158 |
142 \subsection{Distorsion gauss} | 159 \subsection{Distorsion gauss} |
160 | |
143 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. | 161 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. |
144 | 162 |
145 It has has a probability of not being applied, at all, of 70\%. | 163 It has has a probability of not being applied, at all, of 70\%. |
146 | 164 |
147 | 165 |
148 \subsection{Occlusion} | |
149 | |
150 This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. | |
151 | |
152 To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. | |
153 | |
154 These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion. | |
155 | |
156 The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right. | |
157 | |
158 If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$. | |
159 | |
160 In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image. | |
161 | |
162 This filter has a probability of not being applied, at all, of 60\%. | |
163 | |
164 \subsection{Background Images} | 166 \subsection{Background Images} |
165 | 167 |
166 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders). | 168 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders). |
167 | 169 |
168 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$. | 170 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$. |
176 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. | 178 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. |
177 | 179 |
178 This filter also has a probability of not being applied, at all, of 75\%. | 180 This filter also has a probability of not being applied, at all, of 75\%. |
179 | 181 |
180 \subsection{Spatially Gaussian Noise} | 182 \subsection{Spatially Gaussian Noise} |
183 | |
181 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time | 184 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time |
182 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges: | 185 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges: |
183 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. | 186 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. |
184 We also create a symmetric averaging window, of the kernel size, with maximum value at the center. | 187 We also create a symmetric averaging window, of the kernel size, with maximum value at the center. |
185 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers | 188 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers |
189 | 192 |
190 This filter has a probability of not being applied, at all, of 75\%. | 193 This filter has a probability of not being applied, at all, of 75\%. |
191 | 194 |
192 | 195 |
193 \subsection{Color and Contrast Changes} | 196 \subsection{Color and Contrast Changes} |
197 | |
194 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference | 198 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference |
195 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity}$ | 199 between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity$ |
196 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity | 200 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity |
197 is inverted with $0.5$ probability. | 201 is inverted with $0.5$ probability. |
198 | 202 |
199 | 203 |
200 \begin{figure}[h] | 204 \begin{figure}[h] |