ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 474:bcf024e6ab23

fits now, but still now graphics

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sun, 30 May 2010 11:18:11 -0400
parents	2dd6e8962df1
children	db28764b8252

comparison

equal deleted inserted replaced

-:92d6df91939f
+:bcf024e6ab23
 There are two main parts in the pipeline. The first one,
 from slant to pinch below, performs transformations. The second
 part, from blur to contrast, adds different kinds of noise.
 {\large\bf Transformations}\\
-{\bf Slant}\\
+{\bf Slant.}
 We mimic slant by shifting each row of the image
 proportionnaly to its height: $shift = round(slant \times height)$.
 The $slant$ coefficient can be negative or positive with equal probability
 and its value is randomly sampled according to the complexity level:
 e $slant \sim U[0,complexity]$, so the
 maximum displacement for the lowest or highest pixel line is of
 $round(complexity \times 32)$.\\
-{\bf Thickness}\\
+{\bf Thickness.}
 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
 matrix, respectively for dilation or erosion. Ten different structural elements with
 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$
 for erosion.  A neutral element is always present in the set, and if it is
 chosen no transformation is applied.  Erosion allows only the six
 smallest structural elements because when the character is too thin it may
 be completely erased.\\
-{\bf Affine Transformations}\\
+{\bf Affine Transformations.}
 A $2 \times 3$ affine transform matrix (with
 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
 Each pixel $(x,y)$ of the output image takes the value of the pixel
 nearest to $(ax+by+c,dx+ey+f)$ in the input image.  This
 produces scaling, translation, rotation and shearing.
 forbid important rotations (not to confuse classes) but to give good
 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
 complexity]$.\\
-{\bf Local Elastic Deformations}\\
+{\bf Local Elastic Deformations.}
 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03},
 which provides more details.
 Two "displacements" fields are generated and applied, for horizontal
 and vertical displacements of pixels.
 To generate a pixel in either field, first a value between -1 and 1 is
 displacements (larger $\alpha$ translates into larger wiggles).
 Each field is convoluted with a Gaussian 2D kernel of
 standard deviation $\sigma$. Visually, this results in a blur.
 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
 \sqrt[3]{complexity}$.\\
-{\bf Pinch}\\
+{\bf Pinch.}
 This GIMP filter is named "Whirl and
 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}.
 For a square input image, think of drawing a circle of
 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\
 {\large\bf Injecting Noise}\\
-{\bf Motion Blur}\\
+{\bf Motion Blur.}
 This GIMP filter is a ``linear motion blur'' in GIMP
 terminology, with two parameters, $length$ and $angle$. The value of
 a pixel in the final image is the approximately mean value of the $length$ first pixels
 found by moving in the $angle$ direction.
 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\
-{\bf Occlusion}\\
+{\bf Occlusion.}
 This filter selects a random rectangle from an {\em occluder} character
 images and places it over the original {\em occluded} character
 image. Pixels are combined by taking the max(occluder,occluded),
 closer to black. The corners of the occluder  The rectangle corners
 are sampled so that larger complexity gives larger rectangles.
 The destination position in the occluded image are also sampled
 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}).
 It has has a probability of not being applied at all of 60\%.\\
-{\bf Pixel Permutation}\\
+{\bf Pixel Permutation.}
 This filter permutes neighbouring pixels. It selects first
 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then
 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number
 of exchanges to the left, right, top, bottom are equal or does not differ
 from more than 1 if the number of selected pixels is not a multiple of 4.
 It has has a probability of not being applied at all of 80\%.\\
-{\bf Gaussian Noise}\\
+{\bf Gaussian Noise.}
 This filter simply adds, to each pixel of the image independently, a
 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
 It has has a probability of not being applied at all of 70\%.\\
-{\bf Background Images}\\
+{\bf Background Images.}
 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
 background behind the letter. The background is chosen by first selecting,
 at random, an image from a set of images. Then a 32$\times$32 subregion
 of that image is chosen as the background image (by sampling position
 uniformly while making sure not to cross image borders).
 intensity) for both the original image and the background image, $maximage$
 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$.
 Each background pixel value is multiplied by $\frac{max(maximage -
 contrast, 0)}{maxbg}$ (higher contrast yield darker
 background). The output image pixels are max(background,original).\\
-{\bf Salt and Pepper Noise}\\
+{\bf Salt and Pepper Noise.}
 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
 The number of selected pixels is $0.2 \times complexity$.
 This filter has a probability of not being applied at all of 75\%.\\
-{\bf Spatially Gaussian Noise}\\
+{\bf Spatially Gaussian Noise.}
 Different regions of the image are spatially smoothed.
 The image is convolved with a symmetric Gaussian kernel of
 size and variance choosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
 between $0$ and $1$.  We also create a symmetric averaging window, of the
 initialize to zero a mask matrix of the image size. For each selected pixel
 we add to the mask the averaging window centered to it.  The final image is
 computed from the following element-wise operation: $\frac{image + filtered
 image \times mask}{mask+1}$.
 This filter has a probability of not being applied at all of 75\%.\\
-{\bf Scratches}\\
+{\bf Scratches.}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit "1" (one), chosen
 at random among five thousands such 1 images. The 1 image is
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
 complexity)^2$, using bicubic interpolation,
 This filter is only applied only 15\% of the time. When it is applied, 50\%
 of the time, only one patch image is generated and applied. In 30\% of
 cases, two patches are generated, and otherwise three patches are
 generated. The patch is applied by taking the maximal value on any given
 patch or the original image, for each of the 32x32 pixel locations.\\
-{\bf Color and Contrast Changes}\\
+{\bf Color and Contrast Changes.}
 This filter changes the constrast and may invert the image polarity (white
 on black to black on white). The contrast $C$ is defined here as the
 difference between the maximum and the minimum pixel value of the image.
 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$).
 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized
 exponentials) on the output layer for estimating P(class | image).
 The hyper-parameters are the following: number of hidden units, taken in
 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training
 examples are presented in minibatches of size 20. A constant learning
-rate is chosen in $\{10^{-6},10^{-5},10^{-4},10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
+rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
 through preliminary experiments, and 0.1 was selected.
 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
 \label{SdA}

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 474:bcf024e6ab23