diff writeup/nips2010_submission.tex @ 554:e95395f51d72

minor
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Wed, 02 Jun 2010 18:17:52 -0400
parents 8f6c09d1140f
children b6dfba0a110c
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex	Wed Jun 02 17:40:43 2010 -0400
+++ b/writeup/nips2010_submission.tex	Wed Jun 02 18:17:52 2010 -0400
@@ -20,25 +20,7 @@
 
 \vspace*{-2mm}
 \begin{abstract}
-  Recent theoretical and empirical work in statistical machine learning has
-  demonstrated the importance of learning algorithms for deep
-  architectures, i.e., function classes obtained by composing multiple
-  non-linear transformations. Self-taught learning (exploiting unlabeled
-  examples or examples from other distributions) has already been applied
-  to deep learners, but mostly to show the advantage of unlabeled
-  examples. Here we explore the advantage brought by {\em out-of-distribution
-  examples} and show that {\em deep learners benefit more from them than a
-  corresponding shallow learner}, in the area
-  of handwritten character recognition. In fact, we show that they reach
-  human-level performance on both handwritten digit classification and
-  62-class handwritten character recognition.  For this purpose we
-  developed a powerful generator of stochastic variations and noise
-  processes for character images, including not only affine transformations but
-  also slant, local elastic deformations, changes in thickness, background
-  images, grey level changes, contrast, occlusion, and various types of
-  noise. The out-of-distribution examples are 
-  obtained from these highly distorted images or
-  by including examples of object classes different from those in the target test set.
+Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition.  For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set.
 \end{abstract}
 \vspace*{-2mm}
 
@@ -183,11 +165,11 @@
 \begin{minipage}[b]{0.14\linewidth}
 \centering
 \includegraphics[scale=.45]{images/Thick_only.PNG}
-\label{fig:Think}
+\label{fig:Thick}
 \vspace{.9cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Thinkness.}
+{\bf Thickness.}
 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
@@ -372,12 +354,12 @@
 \vspace{.5cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-{\bf Spatially Gaussian Noise.}
+{\bf Spatially Gaussian Smoothing.}
 Different regions of the image are spatially smoothed by convolving
-the image is convolved with a symmetric Gaussian kernel of
+the image with a symmetric Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
-between $0$ and $1$.  We also create a symmetric averaging window, of the
+between $0$ and $1$.  We also create a symmetric weighted averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be
 averaging centers between the original image and the filtered one.  We
@@ -401,7 +383,7 @@
 lines are heavily transformed images of the digit ``1'' (one), chosen
 at random among 500 such 1 images,
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
-complexity)^2$, using bi-cubic interpolation.
+complexity)^2$ (in degrees), using bi-cubic interpolation.
 Two passes of a grey-scale morphological erosion filter
 are applied, reducing the width of the line
 by an amount controlled by $complexity$.