ift6266: writeup/techreport.tex comparison

comparison writeup/techreport.tex @ 426:a7fab59de174

change order of transformations

author	Xavier Glorot <glorotxa@iro.umontreal.ca>
date	Fri, 30 Apr 2010 16:29:17 -0400
parents	c06a3d9b5664
children	ace489930918

comparison

equal deleted inserted replaced

-:c06a3d9b5664
+:a7fab59de174
 As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs.
 $\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$.
+\subsection{Pinch}
+This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface".
+Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter.
+If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio.
+The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found.
+The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
 \subsection{Motion Blur}
 This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter.
 This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm.
 The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$.
-\subsection{Pinch}
+\subsection{Occlusion}
-This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface".
+This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
-Mathematically, for a square input image, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter.
+To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
-If the region considered is not square then, before computing $d_2$, the smallest dimension (x or y) is stretched such that we may consider the region as if it was square. Then, after $d_2$ has been computed and corresponding components $d_2\_x$ and $d_2\_y$ have been found, the component corresponding to the stretched dimension is compressed back by an inverse ratio.
+These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion.
-The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found.
+The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right.
-The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$.
+If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$.
+In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image.
+This filter has a probability of not being applied, at all, of 60\%.
 \subsection{Distorsion gauss}
 This filter simply adds, to each pixel of the image independently, a gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$.
 It has has a probability of not being applied, at all, of 70\%.
-\subsection{Occlusion}
-This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected.
-To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15.
-These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion.
-The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right.
-If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x\_arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x\_arrivee$. The contrary happens if $endroit$ is $right$.
-In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image.
-This filter has a probability of not being applied, at all, of 60\%.
 \subsection{Background Images}
 This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders).
 To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$.
 This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed.
 This filter also has a probability of not being applied, at all, of 75\%.
 \subsection{Spatially Gaussian Noise}
 The aim of this transformation is to filter, with a gaussian kernel, different regions of the image. In order to save computing time
 we decided to convolve the whole image only once with a symmetric gaussian kernel of size and variance choosen uniformly in the ranges:
 $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$.
 We also create a symmetric averaging window, of the kernel size, with maximum value at the center.
 For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers
 This filter has a probability of not being applied, at all, of 75\%.
 \subsection{Color and Contrast Changes}
 This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference
-between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity}$
+between the maximum and the minimum pixel value of the image. A contrast value is sampled uniformly between $1$ and $1-0.85 \times complexity$
 (this insure a minimum constrast of $0.15$). We then simply normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity
 is inverted with $0.5$ probability.
 \begin{figure}[h]

Mercurial > ift6266

comparison writeup/techreport.tex @ 426:a7fab59de174