# HG changeset patch # User Yoshua Bengio # Date 1275517072 14400 # Node ID e95395f51d7274575ad25e908e933c091112acb2 # Parent 8f6c09d1140f842b1e78a80bf44292a886ba2fb8 minor diff -r 8f6c09d1140f -r e95395f51d72 writeup/nips2010_submission.tex --- a/writeup/nips2010_submission.tex Wed Jun 02 17:40:43 2010 -0400 +++ b/writeup/nips2010_submission.tex Wed Jun 02 18:17:52 2010 -0400 @@ -20,25 +20,7 @@ \vspace*{-2mm} \begin{abstract} - Recent theoretical and empirical work in statistical machine learning has - demonstrated the importance of learning algorithms for deep - architectures, i.e., function classes obtained by composing multiple - non-linear transformations. Self-taught learning (exploiting unlabeled - examples or examples from other distributions) has already been applied - to deep learners, but mostly to show the advantage of unlabeled - examples. Here we explore the advantage brought by {\em out-of-distribution - examples} and show that {\em deep learners benefit more from them than a - corresponding shallow learner}, in the area - of handwritten character recognition. In fact, we show that they reach - human-level performance on both handwritten digit classification and - 62-class handwritten character recognition. For this purpose we - developed a powerful generator of stochastic variations and noise - processes for character images, including not only affine transformations but - also slant, local elastic deformations, changes in thickness, background - images, grey level changes, contrast, occlusion, and various types of - noise. The out-of-distribution examples are - obtained from these highly distorted images or - by including examples of object classes different from those in the target test set. +Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples} and show that {\em deep learners benefit more from them than a corresponding shallow learner}, in the area of handwritten character recognition. In fact, we show that they reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. \end{abstract} \vspace*{-2mm} @@ -183,11 +165,11 @@ \begin{minipage}[b]{0.14\linewidth} \centering \includegraphics[scale=.45]{images/Thick_only.PNG} -\label{fig:Think} +\label{fig:Thick} \vspace{.9cm} \end{minipage}% \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Thinkness.} +{\bf Thickness.} Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} are applied. The neighborhood of each pixel is multiplied element-wise with a {\em structuring element} matrix. @@ -372,12 +354,12 @@ \vspace{.5cm} \end{minipage}% \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -{\bf Spatially Gaussian Noise.} +{\bf Spatially Gaussian Smoothing.} Different regions of the image are spatially smoothed by convolving -the image is convolved with a symmetric Gaussian kernel of +the image with a symmetric Gaussian kernel of size and variance chosen uniformly in the ranges $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized -between $0$ and $1$. We also create a symmetric averaging window, of the +between $0$ and $1$. We also create a symmetric weighted averaging window, of the kernel size, with maximum value at the center. For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers between the original image and the filtered one. We @@ -401,7 +383,7 @@ lines are heavily transformed images of the digit ``1'' (one), chosen at random among 500 such 1 images, randomly cropped and rotated by an angle $\sim Normal(0,(100 \times -complexity)^2$, using bi-cubic interpolation. +complexity)^2$ (in degrees), using bi-cubic interpolation. Two passes of a grey-scale morphological erosion filter are applied, reducing the width of the line by an amount controlled by $complexity$.