Mercurial > ift6266
diff writeup/nips2010_submission.tex @ 555:b6dfba0a110c
ameliorer l'aspect visuel, Myriam
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Thu, 03 Jun 2010 08:09:35 -0400 |
parents | e95395f51d72 |
children | 17d16700e0c8 143a1467f157 |
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex Wed Jun 02 18:17:52 2010 -0400 +++ b/writeup/nips2010_submission.tex Thu Jun 03 08:09:35 2010 -0400 @@ -1,6 +1,6 @@ \documentclass{article} % For LaTeX2e \usepackage{nips10submit_e,times} - +\usepackage{wrapfig} \usepackage{amsthm,amsmath,bbm} \usepackage[psamsfonts]{amssymb} \usepackage{algorithm,algorithmic} @@ -22,7 +22,7 @@ \begin{abstract} Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. \end{abstract} -\vspace*{-2mm} +\vspace*{-3mm} \section{Introduction} \vspace*{-1mm} @@ -77,11 +77,10 @@ (one exception is~\citep{CollobertR2008}, which uses very different kinds of learning algorithms). In particular the {\em relative advantage} of deep learning for these settings has not been evaluated. -The hypothesis explored here is that a deep hierarchy of features +The hypothesis discussed in the conclusion is that a deep hierarchy of features may be better able to provide sharing of statistical strength -between different regions in input space or different tasks, -as discussed in the conclusion. - +between different regions in input space or different tasks. +% In this paper we ask the following questions: %\begin{enumerate} @@ -117,19 +116,23 @@ \label{s:perturbations} \vspace*{-1mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Original.PNG} -\label{fig:Original} -\vspace{1.2cm} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Original.} +\begin{wrapfigure}[8]{l}{0.15\textwidth} +%\begin{minipage}[b]{0.14\linewidth} +\vspace*{-5mm} +\begin{center} +\includegraphics[scale=.4]{images/Original.PNG}\\ +{\bf Original} +\end{center} +\end{wrapfigure} +%\vspace{0.7cm} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} This section describes the different transformations we used to stochastically transform source images such as the one on the left in order to obtain data from a larger distribution which covers a domain substantially larger than the clean characters distribution from -which we start. Although character transformations have been used before to +which we start. +Although character transformations have been used before to improve character recognizers, this effort is on a large scale both in number of classes and in the complexity of the transformations, hence in the complexity of the learning task. @@ -142,34 +145,26 @@ There are two main parts in the pipeline. The first one, from slant to pinch below, performs transformations. The second part, from blur to contrast, adds different kinds of noise. -\end{minipage} +%\end{minipage} -{\large\bf Transformations} +\vspace*{1mm} +%\subsection{Transformations} +{\large\bf 2.1 Transformations} +\vspace*{1mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Slant_only.PNG} -\label{fig:Slant} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} +\begin{wrapfigure}[7]{l}{0.15\textwidth} +%\begin{minipage}[b]{0.14\linewidth} %\centering -{\bf Slant.} -Each row of the image is shifted -proportionally to its height: $shift = round(slant \times height)$. -$slant \sim U[-complexity,complexity]$. -\vspace{1.2cm} -\end{minipage} - - -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Thick_only.PNG} -\label{fig:Thick} -\vspace{.9cm} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Thickness.} +\begin{center} +\vspace*{-5mm} +\includegraphics[scale=.4]{images/Thick_only.PNG}\\ +{\bf Thickness} +\end{center} +%\vspace{.6cm} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} +\end{wrapfigure} Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} are applied. The neighborhood of each pixel is multiplied element-wise with a {\em structuring element} matrix. @@ -181,102 +176,137 @@ where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). A neutral element (no transformation) is always present in the set. is applied. -\vspace{.4cm} -\end{minipage} -\vspace{-.7cm} - +%\vspace{.4cm} +%\end{minipage} +%\vspace{-.7cm} \begin{minipage}[b]{0.14\linewidth} \centering -\includegraphics[scale=.45]{images/Affine_only.PNG} -\label{fig:Affine} +\includegraphics[scale=.4]{images/Slant_only.PNG}\\ +{\bf Slant} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Affine Transformations.} +\hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} +%\centering +%\vspace*{-15mm} +Each row of the image is shifted +proportionally to its height: $shift = round(slant \times height)$. +$slant \sim U[-complexity,complexity]$. +\vspace{1.5cm} +\end{minipage} +%\vspace*{-4mm} + +%\begin{minipage}[b]{0.14\linewidth} +%\centering +\begin{wrapfigure}[8]{l}{0.15\textwidth} +\vspace*{-6mm} +\begin{center} +\includegraphics[scale=.4]{images/Affine_only.PNG}\\ +{\bf Affine} +\end{center} +\end{wrapfigure} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} A $2 \times 3$ affine transform matrix (with -6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. +parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. Output pixel $(x,y)$ takes the value of input pixel nearest to $(ax+by+c,dx+ey+f)$, producing scaling, translation, rotation and shearing. -The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to +Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to forbid large rotations (not to confuse classes) but to give good -variability of the transformation: $a$ and $d$ $\sim U[1-3 \times -complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 -\times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times -complexity]$. -\end{minipage} +variability of the transformation: $a$ and $d$ $\sim U[1-3 +complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\, +complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \, +complexity]$.\\ +%\end{minipage} + +\vspace*{-4.5mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} -\label{fig:Elastic} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Local Elastic Deformations.} -This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, +\begin{minipage}[t]{\linewidth} +\begin{wrapfigure}[7]{l}{0.15\textwidth} +%\hspace*{-8mm}\begin{minipage}[b]{0.25\linewidth} +%\centering +\begin{center} +\vspace*{-4mm} +\includegraphics[scale=.4]{images/Localelasticdistorsions_only.PNG}\\ +{\bf Local Elastic} +\end{center} +\end{wrapfigure} +%\end{minipage}% +%\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth} +%\vspace*{-20mm} +This local elastic deformation +filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, which provides more details. The intensity of the displacement fields is given by $\alpha = \sqrt[3]{complexity} \times 10.0$, which are convolved with a Gaussian 2D kernel (resulting in a blur) of standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. -\vspace{.4cm} +%\vspace{.9cm} \end{minipage} -\vspace{-.7cm} + +\vspace*{5mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Pinch_only.PNG} -\label{fig:Pinch} -\vspace{.6cm} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Pinch.} -This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. +%\begin{minipage}[b]{0.14\linewidth} +%\centering +\begin{wrapfigure}[7]{l}{0.15\textwidth} +\vspace*{-5mm} +\begin{center} +\includegraphics[scale=.4]{images/Pinch_only.PNG}\\ +{\bf Pinch} +\end{center} +\end{wrapfigure} +%\vspace{.6cm} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} +This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). -For a square input image, this is akin to drawing a circle of -radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to -that disk (region inside circle) will have its value recalculated by taking -the value of another ``source'' pixel in the original image. The position of -that source pixel is found on the line that goes through $C$ and $P$, but -at some other distance $d_2$. Define $d_1$ to be the distance between $P$ -and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times +For a square input image, draw a radius-$r$ disk +around $C$. Any pixel $P$ belonging to +that disk has its value replaced by +the value of a ``source'' pixel in the original image, +on the line that goes through $C$ and $P$, but +at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter. The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. Here $pinch \sim U[-complexity, 0.7 \times complexity]$. %\vspace{1.5cm} -\end{minipage} +%\end{minipage} + +\vspace{2mm} -\vspace{.1cm} - -{\large\bf Injecting Noise} +{\large\bf 2.2 Injecting Noise} +%\subsection{Injecting Noise} +\vspace{2mm} -\vspace*{-.2cm} -\begin{minipage}[b]{0.14\linewidth} +%\vspace*{-.2cm} +\begin{minipage}[t]{0.14\linewidth} \centering -\includegraphics[scale=.45]{images/Motionblur_only.PNG} -\label{fig:Original} +\vspace*{-2mm} +\includegraphics[scale=.4]{images/Motionblur_only.PNG}\\ +{\bf Motion Blur} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Motion Blur.} +\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} +%\vspace*{.5mm} This is GIMP's ``linear motion blur'' with parameters $length$ and $angle$. The value of -a pixel in the final image is approximately the mean value of the first $length$ pixels -found by moving in the $angle$ direction. -Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. -\vspace{.7cm} +a pixel in the final image is approximately the mean of the first $length$ pixels +found by moving in the $angle$ direction, +$angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. +\vspace{5mm} \end{minipage} -\vspace*{-5mm} +\vspace*{1mm} -\begin{minipage}[b]{0.14\linewidth} +\begin{minipage}[t]{0.14\linewidth} \centering -\includegraphics[scale=.45]{images/occlusion_only.PNG} -\label{fig:Original} +\includegraphics[scale=.4]{images/occlusion_only.PNG}\\ +{\bf Occlusion} +%\vspace{.5cm} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Occlusion.} +\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} +\vspace*{-18mm} Selects a random rectangle from an {\em occluder} character image and places it over the original {\em occluded} image. Pixels are combined by taking the max(occluder,occluded), @@ -285,76 +315,23 @@ The destination position in the occluded image are also sampled according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). This filter is skipped with probability 60\%. -\vspace{.4cm} -\end{minipage} - -\vspace*{-5mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Permutpixel_only.PNG} -\label{fig:Original} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Pixel Permutation.} -This filter permutes neighbouring pixels. It first selects -fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then -sequentially exchanged with one other in as $V4$ neighbourhood. -This filter is skipped with probability 80\%. -\vspace{.8cm} +%\vspace{7mm} \end{minipage} - -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Distorsiongauss_only.PNG} -\label{fig:Original} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Gaussian Noise.} -This filter simply adds, to each pixel of the image independently, a -noise $\sim Normal(0,(\frac{complexity}{10})^2)$. -This filter is skipped with probability 70\%. -\vspace{1.1cm} -\end{minipage} -\vspace{-.7cm} +\vspace*{1mm} -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/background_other_only.png} -\label{fig:Original} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Background Images.} -Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random -background behind the letter, from a randomly chosen natural image, -with contrast adjustments depending on $complexity$, to preserve -more or less of the original character image. -\vspace{.8cm} -\end{minipage} -\vspace{-.7cm} - -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Poivresel_only.PNG} -\label{fig:Original} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Salt and Pepper Noise.} -This filter adds noise $\sim U[0,1]$ to random subsets of pixels. -The number of selected pixels is $0.2 \times complexity$. -This filter is skipped with probability 75\%. -\vspace{.9cm} -\end{minipage} -\vspace{-.7cm} - -\begin{minipage}[b]{0.14\linewidth} -\centering -\includegraphics[scale=.45]{images/Bruitgauss_only.PNG} -\label{fig:Original} -\vspace{.5cm} -\end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Spatially Gaussian Smoothing.} +\begin{wrapfigure}[8]{l}{0.15\textwidth} +\vspace*{-6mm} +\begin{center} +%\begin{minipage}[t]{0.14\linewidth} +%\centering +\includegraphics[scale=.4]{images/Bruitgauss_only.PNG}\\ +{\bf Gaussian Smoothing} +\end{center} +\end{wrapfigure} +%\vspace{.5cm} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} Different regions of the image are spatially smoothed by convolving the image with a symmetric Gaussian kernel of size and variance chosen uniformly in the ranges $[12,12 + 20 \times @@ -368,17 +345,102 @@ computed from the following element-wise operation: $\frac{image + filtered image \times mask}{mask+1}$. This filter is skipped with probability 75\%. +%\end{minipage} + +\newpage + +\vspace*{-9mm} + +%\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth} +%\centering +\begin{minipage}[t]{\linewidth} +\begin{wrapfigure}[7]{l}{0.15\textwidth} +\vspace*{-5mm} +\begin{center} +\includegraphics[scale=.4]{images/Permutpixel_only.PNG}\\ +{\small\bf Permute Pixels} +\end{center} +\end{wrapfigure} +%\end{minipage}% +%\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} +%\vspace*{-20mm} +This filter permutes neighbouring pixels. It first selects +fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then +sequentially exchanged with one other in as $V4$ neighbourhood. +This filter is skipped with probability 80\%.\\ +\vspace*{1mm} \end{minipage} -\vspace{-.7cm} + +\vspace{-1mm} -\begin{minipage}[b]{0.14\linewidth} +\begin{minipage}[t]{\linewidth} +\begin{wrapfigure}[7]{l}{0.15\textwidth} +%\vspace*{-3mm} +\begin{center} +%\hspace*{-3mm}\begin{minipage}[t]{0.18\linewidth} +%\centering +\vspace*{-5mm} +\includegraphics[scale=.4]{images/Distorsiongauss_only.PNG}\\ +{\small \bf Gauss. Noise} +\end{center} +\end{wrapfigure} +%\end{minipage}% +%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} +\vspace*{12mm} +This filter simply adds, to each pixel of the image independently, a +noise $\sim Normal(0,(\frac{complexity}{10})^2)$. +This filter is skipped with probability 70\%. +%\vspace{1.1cm} +\end{minipage} + +\vspace*{1.5cm} + +\begin{minipage}[t]{\linewidth} +\begin{minipage}[t]{0.14\linewidth} \centering -\includegraphics[scale=.45]{images/Rature_only.PNG} -\label{fig:Original} +\includegraphics[scale=.4]{images/background_other_only.png}\\ +{\small \bf Bg Image} +\end{minipage}% +\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} +\vspace*{-18mm} +Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random +background image behind the letter, from a randomly chosen natural image, +with contrast adjustments depending on $complexity$, to preserve +more or less of the original character image. +%\vspace{.8cm} +\end{minipage} +\end{minipage} +%\vspace{-.7cm} + +\begin{minipage}[t]{0.14\linewidth} +\centering +\includegraphics[scale=.4]{images/Poivresel_only.PNG}\\ +{\small \bf Salt \& Pepper} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -\vspace{.4cm} -{\bf Scratches.} +\hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} +\vspace*{-18mm} +This filter adds noise $\sim U[0,1]$ to random subsets of pixels. +The number of selected pixels is $0.2 \times complexity$. +This filter is skipped with probability 75\%. +%\vspace{.9cm} +\end{minipage} +%\vspace{-.7cm} + +\vspace{1mm} + +\begin{minipage}[t]{\linewidth} +\begin{wrapfigure}[7]{l}{0.14\textwidth} +%\begin{minipage}[t]{0.14\linewidth} +%\centering +\begin{center} +\vspace*{-4mm} +\hspace*{-1mm}\includegraphics[scale=.4]{images/Rature_only.PNG}\\ +{\bf Scratches} +%\end{minipage}% +\end{center} +\end{wrapfigure} +%\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} +%\vspace{.4cm} The scratches module places line-like white patches on the image. The lines are heavily transformed images of the digit ``1'' (one), chosen at random among 500 such 1 images, @@ -390,22 +452,23 @@ This filter is skipped with probability 85\%. The probabilities of applying 1, 2, or 3 patches are (50\%,30\%,20\%). \end{minipage} -\vspace{-.7cm} -\begin{minipage}[b]{0.14\linewidth} +\vspace*{2mm} + +\begin{minipage}[t]{0.20\linewidth} \centering -\includegraphics[scale=.45]{images/Contrast_only.PNG} -\label{fig:Original} +\hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.PNG}\\ +{\bf Grey \& Contrast} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Grey Level and Contrast Changes.} -This filter changes the contrast and may invert the image polarity (white +\hspace{-4mm}\begin{minipage}[t]{0.82\linewidth} +\vspace*{-18mm} +This filter changes the contrast by changing grey levels, and may invert the image polarity (white to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity is inverted with probability 50\%. -\vspace{.7cm} +%\vspace{.7cm} \end{minipage} -\vspace{-.7cm} +\vspace{2mm} \iffalse