# HG changeset patch # User fsavard # Date 1272576108 14400 # Node ID 1e9788ce16804ea6903de5062f2fa245d4284a83 # Parent 3dba84c0fbc11efd26d1324d55a103dcaa07699a Added the parts concerning the transformations I'd announced I'd do: Local elastic deformations; occlusions; gimp transformations; salt and pepper noise; background images diff -r 3dba84c0fbc1 -r 1e9788ce1680 deep/crbm/mnist_config.py.example --- a/deep/crbm/mnist_config.py.example Thu Apr 29 17:04:12 2010 -0400 +++ b/deep/crbm/mnist_config.py.example Thu Apr 29 17:21:48 2010 -0400 @@ -74,7 +74,8 @@ # print series to stdout too (otherwise just produce the HDF5 file) SERIES_STDOUT_TOO = False -VISUALIZE_EVERY = 20000 +# every X minibatches +VISUALIZE_EVERY = 1000 # x20, ie. every 20,000 examples GIBBS_STEPS_IN_VIZ_CHAIN = 1000 if TEST_CONFIG: diff -r 3dba84c0fbc1 -r 1e9788ce1680 writeup/techreport.tex --- a/writeup/techreport.tex Thu Apr 29 17:04:12 2010 -0400 +++ b/writeup/techreport.tex Thu Apr 29 17:21:48 2010 -0400 @@ -92,10 +92,72 @@ to the image is low enough not to confuse classes. \subsection{Local Elastic Deformations} -\subsection{GIMP transformation} + +This filter induces a "wiggly" effect in the image. The description here will be brief, as the algorithm follows precisely what is described in . + +The general idea is to generate two "displacements" fields, for horizontal and vertical displacements of pixels. Each of these fields has the same size as the original image. + +When generating the transformed image, we'll loop over the x and y positions in the fields and select, as a value, the value of the pixel in the original image at the (relative) position given by the displacement fields for this x and y. If the position we'd retrieve is outside the borders of the image, we use a 0 value instead. + +To generate a pixel in either field, first a value between -1 and 1 is chosen from a uniform distribution. Then all the pixels, in both fields, is multiplied by a constant $\alpha$ which controls the intensity of the displacements (bigger $\alpha$ translates into larger wiggles). + +As a final step, each field is convoluted with a Gaussian 2D kernel of standard deviation $\sigma$. Visually, this results in a "blur" filter. This has the effect of making values next to each other in the displacement fields similar. In effect, this makes the wiggles more coherent, less noisy. + +As displacement fields were long to compute, 50 pairs of fields were generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs for 0.2, etc.), and afterwards, given a complexity, we selected randomly among the 50 corresponding pairs. + +$\sigma$ and $\alpha$ were linked to complexity through the formulas $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$. + +\subsection{Motion Blur} + +This is a GIMP filter we applied, a "linear motion blur" in GIMP terminology. The description will be brief as it is a well-known filter. + +This algorithm has two input parameters, $length$ and $angle$. The value of a pixel in the final image is the mean value of the $length$ first pixels found by moving in the $angle$ direction. An approximation of this idea is used, as we won't fall onto precise pixels by following that direction. This is done using the Bresenham line algorithm. + +The angle, in our case, is chosen from a uniform distribution over $[0,360]$ degrees. The length, though, depends on the complexity; it's sampled from a Gaussian distribution of mean 0 and standard deviation $\sigma = 3 \times complexity$. + +\subsection{Pinch} + +This is another GIMP filter we used. The filter is in fact named "Whirl and pinch", but we don't use the "whirl" part (whirl is set to 0). As described in GIMP, a pinch is "similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface". + +Mathematically, think of drawing a circle of radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of that source pixel is found on the line thats goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch}$, where $pinch$ is a parameter to the filter. + +The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position. + +The value for $pinch$ in our case was given by sampling from an uniform distribution over the range $[-complexity, 0.7 \times complexity]$. + + \subsection{Occlusion} + +This filter selects random parts of other (hereafter "occlusive") letter images and places them over the original letter (hereafter "occluded") image. To be more precise, having selected a subregion of the occlusive image and a desination position in the occluded image, to determine the final value for a given overlapping pixel, it selects whichever pixel is the lightest. As a reminder, the background value is 0, black, so the value nearest to 1 is selected. + +To select a subpart of the occlusive image, four numbers are generated. For compability with the code, we'll call them "haut", "bas", "gauche" and "droite" (respectively meaning top, bottom, left and right). Each of these numbers is selected according to a Gaussian distribution of mean $8 \times complexity$ and standard deviation $2$. This means the largest the complexity is, the biggest the occlusion will be. The absolute value is taken, as the numbers must be positive, and the maximum value is capped at 15. + +These four sizes collectively define a window centered on the middle pixel of the occlusive image. This is the part that will be extracted as the occlusion. + +The next step is to select a destination position in the occluded image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ are selected according to Gaussian distributions of mean 0 and of standard deviations of, respectively, 3 and 2. Then an horizontal placement mode, $endroit$ (meaning location), is selected to be of three values meaning left, middle or right. + +If $endroit$ is "middle", the occlusion will be horizontally centered around the horizontal middle of the occluded image, then shifted according to $x_\arrivee$. If $endroit$ is "left", it will be placed on the left of the occluded image, then displaced right according to $x_\arrivee$. The contrary happens if $endroit$ is $right$. + +In both the horizontal and vertical positionning, the maximum position in either direction is such that the selected occlusion won't go beyond the borders of the occluded image. + +This filter has a probability of not being applied, at all, of 60%. + \subsection{Background Images} + +This transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then we choose a 32x32 subregion of that image as the background image (by sampling x and y positions uniformly while making sure not to cross image borders). + +To combine the original letter image and the background image, contrast adjustments are made. We first get the maximal values (i.e. maximal intensity) for both the original image and the background image, $maximage$ and $maxbg$. We also have a parameter, $contrast$, given by sampling from a uniform distribution over $[complexity, 1]$. + +Once we have all these numbers, we first adjust the values for the background image. Each pixel value is multiplied by $\frac{max(maximage - contrast, 0)}{maxbg}$. Therefore the higher the contrast, the darkest the background will be. + +The final image is found by taking the brightest (i.e. value nearest to 1) pixel from either the background image or the corresponding pixel in the original image. + \subsection{Salt and Pepper Noise} + +This filter adds noise to the image by randomly selecting a certain number of them and, for those selected pixels, assign a random value according to a uniform distribution over the $[0,1]$ ranges. This last distribution does not change according to complexity. Instead, the number of selected pixels does: the proportion of changed pixels corresponds to $complexity / 5$, which means, as a maximum, 20\% of the pixels will be randomized. On the lowest extreme, no pixel is changed. + +This filter also has a probability of not being applied, at all, of 25\%. + \subsection{Spatially Gaussian Noise} \subsection{Color and Contrast Changes}