Mercurial > ift6266
diff writeup/nips2010_submission.tex @ 474:bcf024e6ab23
fits now, but still now graphics
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sun, 30 May 2010 11:18:11 -0400 |
parents | 2dd6e8962df1 |
children | db28764b8252 |
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex Sun May 30 10:44:40 2010 -0400 +++ b/writeup/nips2010_submission.tex Sun May 30 11:18:11 2010 -0400 @@ -121,7 +121,7 @@ part, from blur to contrast, adds different kinds of noise. {\large\bf Transformations}\\ -{\bf Slant}\\ +{\bf Slant.} We mimic slant by shifting each row of the image proportionnaly to its height: $shift = round(slant \times height)$. The $slant$ coefficient can be negative or positive with equal probability @@ -129,7 +129,7 @@ e $slant \sim U[0,complexity]$, so the maximum displacement for the lowest or highest pixel line is of $round(complexity \times 32)$.\\ -{\bf Thickness}\\ +{\bf Thickness.} Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82} are applied. The neighborhood of each pixel is multiplied element-wise with a {\em structuring element} matrix. @@ -143,7 +143,7 @@ chosen no transformation is applied. Erosion allows only the six smallest structural elements because when the character is too thin it may be completely erased.\\ -{\bf Affine Transformations}\\ +{\bf Affine Transformations.} A $2 \times 3$ affine transform matrix (with 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. Each pixel $(x,y)$ of the output image takes the value of the pixel @@ -155,7 +155,7 @@ complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times complexity]$.\\ -{\bf Local Elastic Deformations}\\ +{\bf Local Elastic Deformations.} This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03}, which provides more details. Two "displacements" fields are generated and applied, for horizontal @@ -168,7 +168,7 @@ standard deviation $\sigma$. Visually, this results in a blur. $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times \sqrt[3]{complexity}$.\\ -{\bf Pinch}\\ +{\bf Pinch.} This GIMP filter is named "Whirl and pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. @@ -185,13 +185,13 @@ Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\ {\large\bf Injecting Noise}\\ -{\bf Motion Blur}\\ +{\bf Motion Blur.} This GIMP filter is a ``linear motion blur'' in GIMP terminology, with two parameters, $length$ and $angle$. The value of a pixel in the final image is the approximately mean value of the $length$ first pixels found by moving in the $angle$ direction. Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\ -{\bf Occlusion}\\ +{\bf Occlusion.} This filter selects a random rectangle from an {\em occluder} character images and places it over the original {\em occluded} character image. Pixels are combined by taking the max(occluder,occluded), @@ -200,18 +200,18 @@ The destination position in the occluded image are also sampled according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). It has has a probability of not being applied at all of 60\%.\\ -{\bf Pixel Permutation}\\ +{\bf Pixel Permutation.} This filter permutes neighbouring pixels. It selects first $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number of exchanges to the left, right, top, bottom are equal or does not differ from more than 1 if the number of selected pixels is not a multiple of 4. It has has a probability of not being applied at all of 80\%.\\ -{\bf Gaussian Noise}\\ +{\bf Gaussian Noise.} This filter simply adds, to each pixel of the image independently, a noise $\sim Normal(0(\frac{complexity}{10})^2)$. It has has a probability of not being applied at all of 70\%.\\ -{\bf Background Images}\\ +{\bf Background Images.} Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random background behind the letter. The background is chosen by first selecting, at random, an image from a set of images. Then a 32$\times$32 subregion @@ -224,11 +224,11 @@ Each background pixel value is multiplied by $\frac{max(maximage - contrast, 0)}{maxbg}$ (higher contrast yield darker background). The output image pixels are max(background,original).\\ -{\bf Salt and Pepper Noise}\\ +{\bf Salt and Pepper Noise.} This filter adds noise $\sim U[0,1]$ to random subsets of pixels. The number of selected pixels is $0.2 \times complexity$. This filter has a probability of not being applied at all of 75\%.\\ -{\bf Spatially Gaussian Noise}\\ +{\bf Spatially Gaussian Noise.} Different regions of the image are spatially smoothed. The image is convolved with a symmetric Gaussian kernel of size and variance choosen uniformly in the ranges $[12,12 + 20 \times @@ -242,7 +242,7 @@ computed from the following element-wise operation: $\frac{image + filtered image \times mask}{mask+1}$. This filter has a probability of not being applied at all of 75\%.\\ -{\bf Scratches}\\ +{\bf Scratches.} The scratches module places line-like white patches on the image. The lines are heavily transformed images of the digit "1" (one), chosen at random among five thousands such 1 images. The 1 image is @@ -256,7 +256,7 @@ cases, two patches are generated, and otherwise three patches are generated. The patch is applied by taking the maximal value on any given patch or the original image, for each of the 32x32 pixel locations.\\ -{\bf Color and Contrast Changes}\\ +{\bf Color and Contrast Changes.} This filter changes the constrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference between the maximum and the minimum pixel value of the image. @@ -360,7 +360,7 @@ The hyper-parameters are the following: number of hidden units, taken in $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training examples are presented in minibatches of size 20. A constant learning -rate is chosen in $\{10^{-6},10^{-5},10^{-4},10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ +rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ through preliminary experiments, and 0.1 was selected.