ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 541:8aad1c6ec39a

reduction espace

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Wed, 02 Jun 2010 10:23:33 -0400
parents	84f42fe05594
children	1cdfc17e890f

comparison

equal deleted inserted replaced

-:269c39f55134
+:8aad1c6ec39a
 There are two main parts in the pipeline. The first one,
 from slant to pinch below, performs transformations. The second
 part, from blur to contrast, adds different kinds of noise.
 \begin{figure}[ht]
+\vspace*{-2mm}
 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}}
 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE
 \caption{Illustration of each transformation applied alone to the same image
 of an upper-case h (top left). First row (from left to right) : original image, slant,
 thickness, affine transformation (translation, rotation, shear),
 local elastic deformation; second row (from left to right) :
 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
 background image, salt and pepper noise, spatially Gaussian noise, scratches,
 grey level and contrast changes.}
 \label{fig:transfo}
+\vspace*{-2mm}
 \end{figure}
 {\large\bf Transformations}
-\vspace*{2mm}
+\vspace*{0.5mm}
 {\bf Slant.}
-We mimic slant by shifting each row of the image
+Each row of the image is shifted
 proportionally to its height: $shift = round(slant \times height)$.
-The $slant$ coefficient can be negative or positive with equal probability
+$slant \sim U[-complexity,complexity]$.
-and its value is randomly sampled according to the complexity level:
+\vspace*{-1mm}
-$slant \sim U[0,complexity]$, so the
-maximum displacement for the lowest or highest pixel line is of
-$round(complexity \times 32)$.
-\vspace*{0mm}
 {\bf Thickness.}
 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
 matrix, respectively for dilation or erosion. Ten different structural elements with
 increasing dimensions (largest is $5\times5$) were used.  For each image,
 randomly sample the operator type (dilation or erosion) with equal probability and one structural
-element from a subset of the $n$ smallest structuring elements where $n$ is
+element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
-$round(10 \times complexity)$ for dilation and $round(6 \times complexity)$
+where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
-for erosion.  A neutral element is always present in the set, and if it is
+A neutral element (no transformation)
-chosen no transformation is applied.  Erosion allows only the six
+is always present in the set. is applied.
-smallest structural elements because when the character is too thin it may
+\vspace*{-1mm}
-be completely erased.
-\vspace*{0mm}
 {\bf Affine Transformations.}
 A $2 \times 3$ affine transform matrix (with
 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
-Each pixel $(x,y)$ of the output image takes the value of the pixel
+Output pixel $(x,y)$ takes the value of input pixel
-nearest to $(ax+by+c,dx+ey+f)$ in the input image.  This
+nearest to $(ax+by+c,dx+ey+f)$,
-produces scaling, translation, rotation and shearing.
+producing scaling, translation, rotation and shearing.
 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to
 forbid important rotations (not to confuse classes) but to give good
 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
 complexity]$.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Local Elastic Deformations.}
 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
 which provides more details.
-Two ``displacements'' fields are generated and applied, for horizontal
+The intensity of the displacement fields is given by
-and vertical displacements of pixels.
+$\alpha = \sqrt[3]{complexity} \times 10.0$, which are
-To generate a pixel in either field, first a value between -1 and 1 is
+convolved with a Gaussian 2D kernel (resulting in a blur) of
-chosen from a uniform distribution. Then all the pixels, in both fields, are
+standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
-multiplied by a constant $\alpha$ which controls the intensity of the
+\vspace*{-1mm}
-displacements (larger $\alpha$ translates into larger wiggles).
-Each field is convoluted with a Gaussian 2D kernel of
-standard deviation $\sigma$. Visually, this results in a blur.
-$\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
-\sqrt[3]{complexity}$.
-\vspace*{0mm}
 {\bf Pinch.}
-This is a GIMP filter called ``Whirl and
+This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0.
-pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic
+A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
 For a square input image, this is akin to drawing a circle of
 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
 that disk (region inside circle) will have its value recalculated by taking
 the value of another ``source'' pixel in the original image. The position of
 d_1$, where $pinch$ is a parameter to the filter.
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
-\vspace*{1mm}
+\vspace*{0.5mm}
 {\large\bf Injecting Noise}
-\vspace*{1mm}
+\vspace*{0.5mm}
 {\bf Motion Blur.}
 This is a ``linear motion blur'' in GIMP
 terminology, with two parameters, $length$ and $angle$. The value of
 a pixel in the final image is approximately the  mean value of the $length$ first pixels
 found by moving in the $angle$ direction.
 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Occlusion.}
 Selects a random rectangle from an {\em occluder} character
 images and places it over the original {\em occluded} character
 image. Pixels are combined by taking the max(occluder,occluded),
 closer to black. The rectangle corners
 are sampled so that larger complexity gives larger rectangles.
 The destination position in the occluded image are also sampled
 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}).
 This filter has a probability of 60\% of not being applied.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Pixel Permutation.}
 This filter permutes neighbouring pixels. It selects first
 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then
 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number
 of exchanges to the left, right, top, bottom is equal or does not differ
 from more than 1 if the number of selected pixels is not a multiple of 4.
 % TODO: The previous sentence is hard to parse
 This filter has a probability of 80\% of not being applied.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Gaussian Noise.}
 This filter simply adds, to each pixel of the image independently, a
 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
 It has a probability of 70\% of not being applied.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Background Images.}
 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
 background behind the letter. The background is chosen by first selecting,
 at random, an image from a set of images. Then a 32$\times$32 sub-region
 intensity) for both the original image and the background image, $maximage$
 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$.
 Each background pixel value is multiplied by $\frac{max(maximage -
 contrast, 0)}{maxbg}$ (higher contrast yield darker
 background). The output image pixels are max(background,original).
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Salt and Pepper Noise.}
 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
 The number of selected pixels is $0.2 \times complexity$.
 This filter has a probability of not being applied at all of 75\%.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Spatially Gaussian Noise.}
 Different regions of the image are spatially smoothed.
 The image is convolved with a symmetric Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 initialize to zero a mask matrix of the image size. For each selected pixel
 we add to the mask the averaging window centered to it.  The final image is
 computed from the following element-wise operation: $\frac{image + filtered
 image \times mask}{mask+1}$.
 This filter has a probability of not being applied at all of 75\%.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Scratches.}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit ``1'' (one), chosen
 at random among five thousands such 1 images. The 1 image is
 This filter is only applied only 15\% of the time. When it is applied, 50\%
 of the time, only one patch image is generated and applied. In 30\% of
 cases, two patches are generated, and otherwise three patches are
 generated. The patch is applied by taking the maximal value on any given
 patch or the original image, for each of the 32x32 pixel locations.
-\vspace*{0mm}
+\vspace*{-1mm}
 {\bf Grey Level and Contrast Changes.}
 This filter changes the contrast and may invert the image polarity (white
 on black to black on white). The contrast $C$ is defined here as the
 difference between the maximum and the minimum pixel value of the image.
 rate was chosen among $\{0.001, 0.01, 0.025, 0.075, 0.1, 0.5\}$
 through preliminary experiments (measuring performance on a validation set),
 and $0.1$ was then selected for optimizing on the whole training sets.
 \begin{figure}[ht]
+\vspace*{-2mm}
 \centerline{\resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}}
 \caption{Illustration of the computations and training criterion for the denoising
 auto-encoder used to pre-train each layer of the deep architecture. Input $x$ of
 the layer (i.e. raw input or output of previous layer)
 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$.
 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which
 is compared to the uncorrupted input $x$ through the loss function
 $L_H(x,z)$, whose expected value is approximately minimized during training
 by tuning $\theta$ and $\theta'$.}
 \label{fig:da}
+\vspace*{-2mm}
 \end{figure}
 {\bf Stacked Denoising Auto-Encoders (SDA).}
 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs)
 can be used to initialize the weights of each layer of a deep MLP (with many hidden
 stacked denoising auto-encoders on MNIST~\citep{VincentPLarochelleH2008}.
 \vspace*{-1mm}
 \begin{figure}[ht]
+\vspace*{-2mm}
 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}}
 \caption{Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained
 on NIST, 1 on NISTP, and 2 on P07. Left: overall results
 of all models, on 3 different test sets (NIST, NISTP, P07).
 Right: error rates on NIST test digits only, along with the previous results from
 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
 respectively based on ART, nearest neighbors, MLPs, and SVMs.}
 \label{fig:error-rates-charts}
-\vspace*{-1mm}
+\vspace*{-2mm}
 \end{figure}
 \section{Experimental Results}

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 541:8aad1c6ec39a