ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 559:cf5a7ee2d892

Merged

author	Olivier Delalleau <delallea@iro>
date	Thu, 03 Jun 2010 09:18:02 -0400
parents	143a1467f157 17d16700e0c8
children	dc5c3f538a05

comparison

equal deleted inserted replaced

-:143a1467f157
+:cf5a7ee2d892
 a corresponding shallow and purely supervised architecture?
 %\end{enumerate}
 Our experimental results provide positive evidence towards all of these questions.
 To achieve these results, we introduce in the next section a sophisticated system
-for stochastically transforming character images. The conclusion discusses
+for stochastically transforming character images and then explain the methodology.
+The conclusion discusses
 the more general question of why deep learners may benefit so much from
 the self-taught learning framework.
 \vspace*{-1mm}
 \section{Perturbation and Transformation of Character Images}
 \end{center}
 %\vspace{.6cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
 \end{wrapfigure}
-Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
+To change character {\bf thickness}, morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
 matrix, respectively for dilation or erosion. Ten different structural elements with
 increasing dimensions (largest is $5\times5$) were used.  For each image,
 {\bf Slant}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth}
 %\centering
 %\vspace*{-15mm}
-Each row of the image is shifted
+To produce {\bf slant}, each row of the image is shifted
 proportionally to its height: $shift = round(slant \times height)$.
 $slant \sim U[-complexity,complexity]$.
 \vspace{1.5cm}
 \end{minipage}
 %\vspace*{-4mm}
 %\centering
 \begin{wrapfigure}[8]{l}{0.15\textwidth}
 \vspace*{-6mm}
 \begin{center}
 \includegraphics[scale=.4]{images/Affine_only.png}\\
-{\bf Affine}
+{\bf Affine Transformation}
 \end{center}
 \end{wrapfigure}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-A $2 \times 3$ affine transform matrix (with
+A $2 \times 3$ {\bf affine transform} matrix (with
 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$.
 Output pixel $(x,y)$ takes the value of input pixel
 nearest to $(ax+by+c,dx+ey+f)$,
 producing scaling, translation, rotation and shearing.
 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to
 \end{center}
 \end{wrapfigure}
 %\end{minipage}%
 %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth}
 %\vspace*{-20mm}
-This local elastic deformation
+The {\bf local elastic} deformation
-filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
+module induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
 which provides more details.
 The intensity of the displacement fields is given by
 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are
 convolved with a Gaussian 2D kernel (resulting in a blur) of
 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
 \end{center}
 \end{wrapfigure}
 %\vspace{.6cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0.
+The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0.
 A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
 For a square input image, draw a radius-$r$ disk
 around $C$. Any pixel $P$ belonging to
 that disk has its value replaced by
 the value of a ``source'' pixel in the original image,
 on the line that goes through $C$ and $P$, but
 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
-d_1$, where $pinch$ is a parameter to the filter.
+d_1$, where $pinch$ is a parameter of the filter.
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
 %\vspace{1.5cm}
 %\end{minipage}
 \includegraphics[scale=.4]{images/Motionblur_only.png}\\
 {\bf Motion Blur}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
 %\vspace*{.5mm}
-This is GIMP's ``linear motion blur''
+The {\bf motion blur} module is GIMP's ``linear motion blur'', which
-with parameters $length$ and $angle$. The value of
+has parameters $length$ and $angle$. The value of
 a pixel in the final image is approximately the  mean of the first $length$ pixels
 found by moving in the $angle$ direction,
 $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
 \vspace{5mm}
 \end{minipage}
 {\bf Occlusion}
 %\vspace{.5cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
 \vspace*{-18mm}
-Selects a random rectangle from an {\em occluder} character
+The {\bf occlusion} module selects a random rectangle from an {\em occluder} character
 image and places it over the original {\em occluded}
 image. Pixels are combined by taking the max(occluder,occluded),
 closer to black. The rectangle corners
 are sampled so that larger complexity gives larger rectangles.
 The destination position in the occluded image are also sampled
 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
-This filter is skipped with probability 60\%.
+This module is skipped with probability 60\%.
 %\vspace{7mm}
 \end{minipage}
 \vspace*{1mm}
 \end{center}
 \end{wrapfigure}
 %\vspace{.5cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
-Different regions of the image are spatially smoothed by convolving
+With the {\bf Gaussian smoothing} module,
+different regions of the image are spatially smoothed by convolving
 the image with a symmetric Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
 between $0$ and $1$.  We also create a symmetric weighted averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
 averaging centers between the original image and the filtered one.  We
 initialize to zero a mask matrix of the image size. For each selected pixel
 we add to the mask the averaging window centered to it.  The final image is
 computed from the following element-wise operation: $\frac{image + filtered
 image \times mask}{mask+1}$.
-This filter is skipped with probability 75\%.
+This module is skipped with probability 75\%.
 %\end{minipage}
 \newpage
 \vspace*{-9mm}
 \end{center}
 \end{wrapfigure}
 %\end{minipage}%
 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth}
 %\vspace*{-20mm}
-This filter permutes neighbouring pixels. It first selects
+This module {\bf permutes neighbouring pixels}. It first selects
 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
 sequentially exchanged with one other in as $V4$ neighbourhood.
-This filter is skipped with probability 80\%.\\
+This module is skipped with probability 80\%.\\
 \vspace*{1mm}
 \end{minipage}
-\vspace{-1mm}
+\vspace{-3mm}
 \begin{minipage}[t]{\linewidth}
 \begin{wrapfigure}[7]{l}{0.15\textwidth}
 %\vspace*{-3mm}
 \begin{center}
 \end{center}
 \end{wrapfigure}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
 \vspace*{12mm}
-This filter simply adds, to each pixel of the image independently, a
+The {\bf Gaussian noise} module simply adds, to each pixel of the image independently, a
 noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
-This filter is skipped with probability 70\%.
+This module is skipped with probability 70\%.
 %\vspace{1.1cm}
 \end{minipage}
-\vspace*{1.5cm}
+\vspace*{1.2cm}
 \begin{minipage}[t]{\linewidth}
 \begin{minipage}[t]{0.14\linewidth}
 \centering
 \includegraphics[scale=.4]{images/background_other_only.png}\\
 {\small \bf Bg Image}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
 \vspace*{-18mm}
-Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
+Following~\citet{Larochelle-jmlr-2009}, the {\bf background image} module adds a random
 background image behind the letter, from a randomly chosen natural image,
 with contrast adjustments depending on $complexity$, to preserve
 more or less of the original character image.
 %\vspace{.8cm}
 \end{minipage}
 \includegraphics[scale=.4]{images/Poivresel_only.png}\\
 {\small \bf Salt \& Pepper}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
 \vspace*{-18mm}
-This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
+The {\bf salt and pepper noise} module adds noise $\sim U[0,1]$ to random subsets of pixels.
 The number of selected pixels is $0.2 \times complexity$.
-This filter is skipped with probability 75\%.
+This module is skipped with probability 75\%.
 %\vspace{.9cm}
 \end{minipage}
 %\vspace{-.7cm}
 \vspace{1mm}
 %\end{minipage}%
 \end{center}
 \end{wrapfigure}
 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
 %\vspace{.4cm}
-The scratches module places line-like white patches on the image.  The
+The {\bf scratches} module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit ``1'' (one), chosen
 at random among 500 such 1 images,
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
 complexity)^2$ (in degrees), using bi-cubic interpolation.
 Two passes of a grey-scale morphological erosion filter
 are applied, reducing the width of the line
 by an amount controlled by $complexity$.
-This filter is skipped with probability 85\%. The probabilities
+This module is skipped with probability 85\%. The probabilities
 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
 \end{minipage}
 \vspace*{2mm}
-\begin{minipage}[t]{0.20\linewidth}
+\begin{minipage}[t]{0.25\linewidth}
 \centering
-\hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\
+\hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\
-{\bf Grey \& Contrast}
+{\bf Grey Level \& Contrast}
 \end{minipage}%
-\hspace{-4mm}\begin{minipage}[t]{0.82\linewidth}
+\hspace{-12mm}\begin{minipage}[t]{0.82\linewidth}
-\vspace*{-18mm}
+t -m "\vspace*{-18mm}
-This filter changes the contrast by changing grey levels, and may invert the image polarity (white
+The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white
 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 polarity is inverted with probability 50\%.
 %\vspace{.7cm}
 \end{minipage}
 \label{fig:error-rates-charts}
 \vspace*{-2mm}
 \end{figure}
+\begin{figure}[ht]
+\vspace*{-3mm}
+\centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}}
+\vspace*{-3mm}
+\caption{Relative improvement in error rate due to self-taught learning.
+Left: Improvement (or loss, when negative)
+induced by out-of-distribution examples (perturbed data).
+Right: Improvement (or loss, when negative) induced by multi-task
+learning (training on all classes and testing only on either digits,
+upper case, or lower-case). The deep learner (SDA) benefits more from
+both self-taught learning scenarios, compared to the shallow MLP.}
+\label{fig:improvements-charts}
+\vspace*{-2mm}
+\end{figure}
 \section{Experimental Results}
 \vspace*{-2mm}
 %\vspace*{-1mm}
 %\subsection{SDA vs MLP vs Humans}
 and the 10-class (digits) task.
 17\% error (SDA1) or 18\% error (humans) may seem large but a large
 majority of the errors from humans and from SDA1 are from out-of-context
 confusions (e.g. a vertical bar can be a ``1'', an ``l'' or an ``L'', and a
 ``c'' and a ``C'' are often indistinguishible).
-\begin{figure}[ht]
-\vspace*{-3mm}
-\centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}}
-\vspace*{-3mm}
-\caption{Relative improvement in error rate due to self-taught learning.
-Left: Improvement (or loss, when negative)
-induced by out-of-distribution examples (perturbed data).
-Right: Improvement (or loss, when negative) induced by multi-task
-learning (training on all classes and testing only on either digits,
-upper case, or lower-case). The deep learner (SDA) benefits more from
-both self-taught learning scenarios, compared to the shallow MLP.}
-\label{fig:improvements-charts}
-\vspace*{-2mm}
-\end{figure}
 In addition, as shown in the left of
 Figure~\ref{fig:improvements-charts}, the relative improvement in error
 rate brought by self-taught learning is greater for the SDA, and these
 differences with the MLP are statistically and qualitatively

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 559:cf5a7ee2d892