ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 554:e95395f51d72

minor

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Wed, 02 Jun 2010 18:17:52 -0400
parents	8f6c09d1140f
children	b6dfba0a110c

comparison

equal deleted inserted replaced

-:8f6c09d1140f
+:e95395f51d72
 %\makeanontitle
 \maketitle
 \vspace*{-2mm}
 \begin{abstract}
-Recent theoretical and empirical work in statistical machine learning has
+Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition.  For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set.
-demonstrated the importance of learning algorithms for deep
-architectures, i.e., function classes obtained by composing multiple
-non-linear transformations. Self-taught learning (exploiting unlabeled
-examples or examples from other distributions) has already been applied
-to deep learners, but mostly to show the advantage of unlabeled
-examples. Here we explore the advantage brought by {\em out-of-distribution
-examples} and show that {\em deep learners benefit more from them than a
-corresponding shallow learner}, in the area
-of handwritten character recognition. In fact, we show that they reach
-human-level performance on both handwritten digit classification and
-62-class handwritten character recognition.  For this purpose we
-developed a powerful generator of stochastic variations and noise
-processes for character images, including not only affine transformations but
-also slant, local elastic deformations, changes in thickness, background
-images, grey level changes, contrast, occlusion, and various types of
-noise. The out-of-distribution examples are
-obtained from these highly distorted images or
-by including examples of object classes different from those in the target test set.
 \end{abstract}
 \vspace*{-2mm}
 \section{Introduction}
 \vspace*{-1mm}
 \begin{minipage}[b]{0.14\linewidth}
 \centering
 \includegraphics[scale=.45]{images/Thick_only.PNG}
-\label{fig:Think}
+\label{fig:Thick}
 \vspace{.9cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Thinkness.}
+{\bf Thickness.}
 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
 matrix, respectively for dilation or erosion. Ten different structural elements with
 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG}
 \label{fig:Original}
 \vspace{.5cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Spatially Gaussian Noise.}
+{\bf Spatially Gaussian Smoothing.}
 Different regions of the image are spatially smoothed by convolving
-the image is convolved with a symmetric Gaussian kernel of
+the image with a symmetric Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
-between $0$ and $1$.  We also create a symmetric averaging window, of the
+between $0$ and $1$.  We also create a symmetric weighted averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be
 averaging centers between the original image and the filtered one.  We
 initialize to zero a mask matrix of the image size. For each selected pixel
 we add to the mask the averaging window centered to it.  The final image is
 {\bf Scratches.}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit ``1'' (one), chosen
 at random among 500 such 1 images,
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
-complexity)^2$, using bi-cubic interpolation.
+complexity)^2$ (in degrees), using bi-cubic interpolation.
 Two passes of a grey-scale morphological erosion filter
 are applied, reducing the width of the line
 by an amount controlled by $complexity$.
 This filter is skipped with probability 85\%. The probabilities
 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 554:e95395f51d72