diff writeup/nips2010_submission.tex @ 555:b6dfba0a110c

ameliorer l'aspect visuel, Myriam
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Thu, 03 Jun 2010 08:09:35 -0400
parents e95395f51d72
children 17d16700e0c8 143a1467f157
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex	Wed Jun 02 18:17:52 2010 -0400
+++ b/writeup/nips2010_submission.tex	Thu Jun 03 08:09:35 2010 -0400
@@ -1,6 +1,6 @@
 \documentclass{article} % For LaTeX2e
 \usepackage{nips10submit_e,times}
-
+\usepackage{wrapfig}
 \usepackage{amsthm,amsmath,bbm} 
 \usepackage[psamsfonts]{amssymb}
 \usepackage{algorithm,algorithmic}
@@ -22,7 +22,7 @@
 \begin{abstract}
 Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition.  For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set.
 \end{abstract}
-\vspace*{-2mm}
+\vspace*{-3mm}
 
 \section{Introduction}
 \vspace*{-1mm}
@@ -77,11 +77,10 @@
 (one exception is~\citep{CollobertR2008}, which uses very different kinds
 of learning algorithms). In particular the {\em relative
 advantage} of deep learning for these settings has not been evaluated.
-The hypothesis explored here is that a deep hierarchy of features
+The hypothesis discussed in the conclusion is that a deep hierarchy of features
 may be better able to provide sharing of statistical strength
-between different regions in input space or different tasks,
-as discussed in the conclusion.
-
+between different regions in input space or different tasks.
+%
 In this paper we ask the following questions:
 
 %\begin{enumerate}
@@ -117,19 +116,23 @@
 \label{s:perturbations}
 \vspace*{-1mm}
 
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Original.PNG}
-\label{fig:Original}
-\vspace{1.2cm}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Original.}
+\begin{wrapfigure}[8]{l}{0.15\textwidth}
+%\begin{minipage}[b]{0.14\linewidth}
+\vspace*{-5mm}
+\begin{center}
+\includegraphics[scale=.4]{images/Original.PNG}\\
+{\bf Original}
+\end{center}
+\end{wrapfigure}
+%\vspace{0.7cm}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
 This section describes the different transformations we used to stochastically
 transform source images such as the one on the left
 in order to obtain data from a larger distribution which
 covers a domain substantially larger than the clean characters distribution from
-which we start. Although character transformations have been used before to
+which we start. 
+Although character transformations have been used before to
 improve character recognizers, this effort is on a large scale both
 in number of classes and in the complexity of the transformations, hence
 in the complexity of the learning task.
@@ -142,34 +145,26 @@
 There are two main parts in the pipeline. The first one,
 from slant to pinch below, performs transformations. The second
 part, from blur to contrast, adds different kinds of noise.
-\end{minipage}
+%\end{minipage}
 
-{\large\bf Transformations}
+\vspace*{1mm}
+%\subsection{Transformations}
+{\large\bf 2.1 Transformations}
+\vspace*{1mm}
 
 
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Slant_only.PNG}
-\label{fig:Slant}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
+\begin{wrapfigure}[7]{l}{0.15\textwidth}
+%\begin{minipage}[b]{0.14\linewidth}
 %\centering
-{\bf Slant.}
-Each row of the image is shifted
-proportionally to its height: $shift = round(slant \times height)$.  
-$slant \sim U[-complexity,complexity]$.
-\vspace{1.2cm}
-\end{minipage}
-
-
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Thick_only.PNG}
-\label{fig:Thick}
-\vspace{.9cm}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Thickness.}
+\begin{center}
+\vspace*{-5mm}
+\includegraphics[scale=.4]{images/Thick_only.PNG}\\
+{\bf Thickness}
+\end{center}
+%\vspace{.6cm}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
+\end{wrapfigure}
 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
@@ -181,102 +176,137 @@
 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).  
 A neutral element (no transformation) 
 is always present in the set. is applied.  
-\vspace{.4cm}
-\end{minipage}
-\vspace{-.7cm}
-
+%\vspace{.4cm}
+%\end{minipage}
+%\vspace{-.7cm}
 
 \begin{minipage}[b]{0.14\linewidth}
 \centering
-\includegraphics[scale=.45]{images/Affine_only.PNG}
-\label{fig:Affine}
+\includegraphics[scale=.4]{images/Slant_only.PNG}\\
+{\bf Slant}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Affine Transformations.}
+\hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth}
+%\centering
+%\vspace*{-15mm}
+Each row of the image is shifted
+proportionally to its height: $shift = round(slant \times height)$.  
+$slant \sim U[-complexity,complexity]$.
+\vspace{1.5cm}
+\end{minipage}
+%\vspace*{-4mm}
+
+%\begin{minipage}[b]{0.14\linewidth}
+%\centering
+\begin{wrapfigure}[8]{l}{0.15\textwidth}
+\vspace*{-6mm}
+\begin{center}
+\includegraphics[scale=.4]{images/Affine_only.PNG}\\
+{\bf Affine}
+\end{center}
+\end{wrapfigure}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
 A $2 \times 3$ affine transform matrix (with
-6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
+parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$.
 Output pixel $(x,y)$ takes the value of input pixel
 nearest to $(ax+by+c,dx+ey+f)$,
 producing scaling, translation, rotation and shearing.
-The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to
+Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to
 forbid large rotations (not to confuse classes) but to give good
-variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
-complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
-\times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
-complexity]$.
-\end{minipage}
+variability of the transformation: $a$ and $d$ $\sim U[1-3
+complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\,
+complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \,
+complexity]$.\\
+%\end{minipage}
+
+\vspace*{-4.5mm}
 
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG}
-\label{fig:Elastic}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Local Elastic Deformations.}
-This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
+\begin{minipage}[t]{\linewidth}
+\begin{wrapfigure}[7]{l}{0.15\textwidth}
+%\hspace*{-8mm}\begin{minipage}[b]{0.25\linewidth}
+%\centering
+\begin{center}
+\vspace*{-4mm}
+\includegraphics[scale=.4]{images/Localelasticdistorsions_only.PNG}\\
+{\bf Local Elastic}
+\end{center}
+\end{wrapfigure}
+%\end{minipage}%
+%\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth}
+%\vspace*{-20mm}
+This local elastic deformation 
+filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
 which provides more details. 
 The intensity of the displacement fields is given by 
 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are 
 convolved with a Gaussian 2D kernel (resulting in a blur) of
 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
-\vspace{.4cm}
+%\vspace{.9cm}
 \end{minipage}
-\vspace{-.7cm}
+
+\vspace*{5mm}
 
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Pinch_only.PNG}
-\label{fig:Pinch}
-\vspace{.6cm}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Pinch.}
-This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. 
+%\begin{minipage}[b]{0.14\linewidth}
+%\centering
+\begin{wrapfigure}[7]{l}{0.15\textwidth}
+\vspace*{-5mm}
+\begin{center}
+\includegraphics[scale=.4]{images/Pinch_only.PNG}\\
+{\bf Pinch}
+\end{center}
+\end{wrapfigure}
+%\vspace{.6cm}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
+This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0. 
 A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
-For a square input image, this is akin to drawing a circle of
-radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
-that disk (region inside circle) will have its value recalculated by taking
-the value of another ``source'' pixel in the original image. The position of
-that source pixel is found on the line that goes through $C$ and $P$, but
-at some other distance $d_2$. Define $d_1$ to be the distance between $P$
-and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
+For a square input image, draw a radius-$r$ disk
+around $C$. Any pixel $P$ belonging to
+that disk has its value replaced by
+the value of a ``source'' pixel in the original image,
+on the line that goes through $C$ and $P$, but
+at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
 d_1$, where $pinch$ is a parameter to the filter.
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
 %\vspace{1.5cm}
-\end{minipage}
+%\end{minipage}
+
+\vspace{2mm}
 
-\vspace{.1cm}
-
-{\large\bf Injecting Noise}
+{\large\bf 2.2 Injecting Noise}
+%\subsection{Injecting Noise}
+\vspace{2mm}
 
-\vspace*{-.2cm}
-\begin{minipage}[b]{0.14\linewidth}
+%\vspace*{-.2cm}
+\begin{minipage}[t]{0.14\linewidth}
 \centering
-\includegraphics[scale=.45]{images/Motionblur_only.PNG}
-\label{fig:Original}
+\vspace*{-2mm}
+\includegraphics[scale=.4]{images/Motionblur_only.PNG}\\
+{\bf Motion Blur}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Motion Blur.}
+\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
+%\vspace*{.5mm}
 This is GIMP's ``linear motion blur'' 
 with parameters $length$ and $angle$. The value of
-a pixel in the final image is approximately the  mean value of the first $length$ pixels
-found by moving in the $angle$ direction. 
-Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
-\vspace{.7cm}
+a pixel in the final image is approximately the  mean of the first $length$ pixels
+found by moving in the $angle$ direction,
+$angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
+\vspace{5mm}
 \end{minipage}
 
-\vspace*{-5mm}
+\vspace*{1mm}
 
-\begin{minipage}[b]{0.14\linewidth}
+\begin{minipage}[t]{0.14\linewidth}
 \centering
-\includegraphics[scale=.45]{images/occlusion_only.PNG}
-\label{fig:Original}
+\includegraphics[scale=.4]{images/occlusion_only.PNG}\\
+{\bf Occlusion}
+%\vspace{.5cm}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Occlusion.}
+\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
+\vspace*{-18mm}
 Selects a random rectangle from an {\em occluder} character
 image and places it over the original {\em occluded}
 image. Pixels are combined by taking the max(occluder,occluded),
@@ -285,76 +315,23 @@
 The destination position in the occluded image are also sampled
 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
 This filter is skipped with probability 60\%.
-\vspace{.4cm}
-\end{minipage}
-
-\vspace*{-5mm}
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Permutpixel_only.PNG}
-\label{fig:Original}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Pixel Permutation.}
-This filter permutes neighbouring pixels. It first selects
-fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
-sequentially exchanged with one other in as $V4$ neighbourhood. 
-This filter is skipped with probability 80\%.
-\vspace{.8cm}
+%\vspace{7mm}
 \end{minipage}
 
-
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Distorsiongauss_only.PNG}
-\label{fig:Original}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Gaussian Noise.}
-This filter simply adds, to each pixel of the image independently, a
-noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
-This filter is skipped with probability 70\%.
-\vspace{1.1cm}
-\end{minipage}
-\vspace{-.7cm}
+\vspace*{1mm}
 
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/background_other_only.png}
-\label{fig:Original}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Background Images.}
-Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
-background behind the letter, from a randomly chosen natural image,
-with contrast adjustments depending on $complexity$, to preserve
-more or less of the original character image.
-\vspace{.8cm}
-\end{minipage}
-\vspace{-.7cm}
-
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Poivresel_only.PNG}
-\label{fig:Original}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Salt and Pepper Noise.}
-This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
-The number of selected pixels is $0.2 \times complexity$.
-This filter is skipped with probability 75\%.
-\vspace{.9cm}
-\end{minipage}
-\vspace{-.7cm}
-
-\begin{minipage}[b]{0.14\linewidth}
-\centering
-\includegraphics[scale=.45]{images/Bruitgauss_only.PNG}
-\label{fig:Original}
-\vspace{.5cm}
-\end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Spatially Gaussian Smoothing.}
+\begin{wrapfigure}[8]{l}{0.15\textwidth}
+\vspace*{-6mm}
+\begin{center}
+%\begin{minipage}[t]{0.14\linewidth}
+%\centering
+\includegraphics[scale=.4]{images/Bruitgauss_only.PNG}\\
+{\bf Gaussian Smoothing}
+\end{center}
+\end{wrapfigure}
+%\vspace{.5cm}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
 Different regions of the image are spatially smoothed by convolving
 the image with a symmetric Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
@@ -368,17 +345,102 @@
 computed from the following element-wise operation: $\frac{image + filtered
   image \times mask}{mask+1}$.
 This filter is skipped with probability 75\%.
+%\end{minipage}
+
+\newpage
+
+\vspace*{-9mm}
+
+%\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth}
+%\centering
+\begin{minipage}[t]{\linewidth}
+\begin{wrapfigure}[7]{l}{0.15\textwidth}
+\vspace*{-5mm}
+\begin{center}
+\includegraphics[scale=.4]{images/Permutpixel_only.PNG}\\
+{\small\bf Permute Pixels}
+\end{center}
+\end{wrapfigure}
+%\end{minipage}%
+%\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth}
+%\vspace*{-20mm}
+This filter permutes neighbouring pixels. It first selects
+fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
+sequentially exchanged with one other in as $V4$ neighbourhood. 
+This filter is skipped with probability 80\%.\\
+\vspace*{1mm}
 \end{minipage}
-\vspace{-.7cm}
+
+\vspace{-1mm}
 
-\begin{minipage}[b]{0.14\linewidth}
+\begin{minipage}[t]{\linewidth}
+\begin{wrapfigure}[7]{l}{0.15\textwidth}
+%\vspace*{-3mm}
+\begin{center}
+%\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth}
+%\centering
+\vspace*{-5mm}
+\includegraphics[scale=.4]{images/Distorsiongauss_only.PNG}\\
+{\small \bf Gauss. Noise}
+\end{center}
+\end{wrapfigure}
+%\end{minipage}%
+%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
+\vspace*{12mm}
+This filter simply adds, to each pixel of the image independently, a
+noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
+This filter is skipped with probability 70\%.
+%\vspace{1.1cm}
+\end{minipage}
+
+\vspace*{1.5cm}
+
+\begin{minipage}[t]{\linewidth}
+\begin{minipage}[t]{0.14\linewidth}
 \centering
-\includegraphics[scale=.45]{images/Rature_only.PNG}
-\label{fig:Original}
+\includegraphics[scale=.4]{images/background_other_only.png}\\
+{\small \bf Bg Image}
+\end{minipage}%
+\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
+\vspace*{-18mm}
+Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
+background image behind the letter, from a randomly chosen natural image,
+with contrast adjustments depending on $complexity$, to preserve
+more or less of the original character image.
+%\vspace{.8cm}
+\end{minipage}
+\end{minipage}
+%\vspace{-.7cm}
+
+\begin{minipage}[t]{0.14\linewidth}
+\centering
+\includegraphics[scale=.4]{images/Poivresel_only.PNG}\\
+{\small \bf Salt \& Pepper}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-\vspace{.4cm}
-{\bf Scratches.}
+\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
+\vspace*{-18mm}
+This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
+The number of selected pixels is $0.2 \times complexity$.
+This filter is skipped with probability 75\%.
+%\vspace{.9cm}
+\end{minipage}
+%\vspace{-.7cm}
+
+\vspace{1mm}
+
+\begin{minipage}[t]{\linewidth}
+\begin{wrapfigure}[7]{l}{0.14\textwidth}
+%\begin{minipage}[t]{0.14\linewidth}
+%\centering
+\begin{center}
+\vspace*{-4mm}
+\hspace*{-1mm}\includegraphics[scale=.4]{images/Rature_only.PNG}\\
+{\bf Scratches}
+%\end{minipage}%
+\end{center}
+\end{wrapfigure}
+%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
+%\vspace{.4cm}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit ``1'' (one), chosen
 at random among 500 such 1 images,
@@ -390,22 +452,23 @@
 This filter is skipped with probability 85\%. The probabilities
 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
 \end{minipage}
-\vspace{-.7cm}
 
-\begin{minipage}[b]{0.14\linewidth}
+\vspace*{2mm}
+
+\begin{minipage}[t]{0.20\linewidth}
 \centering
-\includegraphics[scale=.45]{images/Contrast_only.PNG}
-\label{fig:Original}
+\hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.PNG}\\
+{\bf Grey \& Contrast}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Grey Level and Contrast Changes.}
-This filter changes the contrast and may invert the image polarity (white
+\hspace{-4mm}\begin{minipage}[t]{0.82\linewidth}
+\vspace*{-18mm}
+This filter changes the contrast by changing grey levels, and may invert the image polarity (white
 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ 
 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 polarity is inverted with probability 50\%.
-\vspace{.7cm}
+%\vspace{.7cm}
 \end{minipage}
-\vspace{-.7cm}
+\vspace{2mm}
 
 
 \iffalse