# HG changeset patch
# User Yoshua Bengio <bengioy@iro.umontreal.ca>
# Date 1275232691 14400
# Node ID bcf024e6ab2312041b9ae7174cd815cf1079dc90
# Parent  92d6df91939f050588f16b0951c937fbbb3c33f9
fits now, but still now graphics

diff -r 92d6df91939f -r bcf024e6ab23 writeup/nips2010_submission.tex
--- a/writeup/nips2010_submission.tex	Sun May 30 10:44:40 2010 -0400
+++ b/writeup/nips2010_submission.tex	Sun May 30 11:18:11 2010 -0400
@@ -121,7 +121,7 @@
 part, from blur to contrast, adds different kinds of noise.
 
 {\large\bf Transformations}\\
-{\bf Slant}\\
+{\bf Slant.} 
 We mimic slant by shifting each row of the image
 proportionnaly to its height: $shift = round(slant \times height)$.  
 The $slant$ coefficient can be negative or positive with equal probability
@@ -129,7 +129,7 @@
 e $slant \sim U[0,complexity]$, so the
 maximum displacement for the lowest or highest pixel line is of
 $round(complexity \times 32)$.\\
-{\bf Thickness}\\
+{\bf Thickness.}
 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
@@ -143,7 +143,7 @@
 chosen no transformation is applied.  Erosion allows only the six
 smallest structural elements because when the character is too thin it may
 be completely erased.\\
-{\bf Affine Transformations}\\
+{\bf Affine Transformations.}
 A $2 \times 3$ affine transform matrix (with
 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
 Each pixel $(x,y)$ of the output image takes the value of the pixel
@@ -155,7 +155,7 @@
 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
 complexity]$.\\
-{\bf Local Elastic Deformations}\\
+{\bf Local Elastic Deformations.}
 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03},
 which provides more details. 
 Two "displacements" fields are generated and applied, for horizontal
@@ -168,7 +168,7 @@
 standard deviation $\sigma$. Visually, this results in a blur.
 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
 \sqrt[3]{complexity}$.\\
-{\bf Pinch}\\
+{\bf Pinch.}
 This GIMP filter is named "Whirl and
 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}.
@@ -185,13 +185,13 @@
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\
 
 {\large\bf Injecting Noise}\\
-{\bf Motion Blur}\\
+{\bf Motion Blur.}
 This GIMP filter is a ``linear motion blur'' in GIMP
 terminology, with two parameters, $length$ and $angle$. The value of
 a pixel in the final image is the approximately mean value of the $length$ first pixels
 found by moving in the $angle$ direction. 
 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\
-{\bf Occlusion}\\
+{\bf Occlusion.}
 This filter selects a random rectangle from an {\em occluder} character
 images and places it over the original {\em occluded} character
 image. Pixels are combined by taking the max(occluder,occluded),
@@ -200,18 +200,18 @@
 The destination position in the occluded image are also sampled
 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}).
 It has has a probability of not being applied at all of 60\%.\\
-{\bf Pixel Permutation}\\
+{\bf Pixel Permutation.}
 This filter permutes neighbouring pixels. It selects first
 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then
 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number
 of exchanges to the left, right, top, bottom are equal or does not differ
 from more than 1 if the number of selected pixels is not a multiple of 4.
 It has has a probability of not being applied at all of 80\%.\\
-{\bf Gaussian Noise}\\
+{\bf Gaussian Noise.}
 This filter simply adds, to each pixel of the image independently, a
 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
 It has has a probability of not being applied at all of 70\%.\\
-{\bf Background Images}\\
+{\bf Background Images.}
 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
 background behind the letter. The background is chosen by first selecting,
 at random, an image from a set of images. Then a 32$\times$32 subregion
@@ -224,11 +224,11 @@
 Each background pixel value is multiplied by $\frac{max(maximage -
   contrast, 0)}{maxbg}$ (higher contrast yield darker
 background). The output image pixels are max(background,original).\\
-{\bf Salt and Pepper Noise}\\
+{\bf Salt and Pepper Noise.}
 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
 The number of selected pixels is $0.2 \times complexity$.
 This filter has a probability of not being applied at all of 75\%.\\
-{\bf Spatially Gaussian Noise}\\
+{\bf Spatially Gaussian Noise.}
 Different regions of the image are spatially smoothed.
 The image is convolved with a symmetric Gaussian kernel of
 size and variance choosen uniformly in the ranges $[12,12 + 20 \times
@@ -242,7 +242,7 @@
 computed from the following element-wise operation: $\frac{image + filtered
   image \times mask}{mask+1}$.
 This filter has a probability of not being applied at all of 75\%.\\
-{\bf Scratches}\\
+{\bf Scratches.}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit "1" (one), chosen
 at random among five thousands such 1 images. The 1 image is
@@ -256,7 +256,7 @@
 cases, two patches are generated, and otherwise three patches are
 generated. The patch is applied by taking the maximal value on any given
 patch or the original image, for each of the 32x32 pixel locations.\\
-{\bf Color and Contrast Changes}\\
+{\bf Color and Contrast Changes.}
 This filter changes the constrast and may invert the image polarity (white
 on black to black on white). The contrast $C$ is defined here as the
 difference between the maximum and the minimum pixel value of the image. 
@@ -360,7 +360,7 @@
 The hyper-parameters are the following: number of hidden units, taken in 
 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training
 examples are presented in minibatches of size 20. A constant learning
-rate is chosen in $\{10^{-6},10^{-5},10^{-4},10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
+rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
 through preliminary experiments, and 0.1 was selected.