# HG changeset patch # User Olivier Delalleau # Date 1275571082 14400 # Node ID cf5a7ee2d89222c0f044f7681e93ff0b0dfd8fab # Parent 143a1467f157e1235f2c4f42203279b40862bf3c# Parent 17d16700e0c8c19bb7a199de93d8fe8dbca8fdf9 Merged diff -r 143a1467f157 -r cf5a7ee2d892 writeup/nips2010_submission.tex --- a/writeup/nips2010_submission.tex Thu Jun 03 09:16:53 2010 -0400 +++ b/writeup/nips2010_submission.tex Thu Jun 03 09:18:02 2010 -0400 @@ -107,7 +107,8 @@ Our experimental results provide positive evidence towards all of these questions. To achieve these results, we introduce in the next section a sophisticated system -for stochastically transforming character images. The conclusion discusses +for stochastically transforming character images and then explain the methodology. +The conclusion discusses the more general question of why deep learners may benefit so much from the self-taught learning framework. @@ -165,7 +166,7 @@ %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} \end{wrapfigure} -Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} +To change character {\bf thickness}, morphological operators of dilation and erosion~\citep{Haralick87,Serra82} are applied. The neighborhood of each pixel is multiplied element-wise with a {\em structuring element} matrix. The pixel value is replaced by the maximum or the minimum of the resulting @@ -188,7 +189,7 @@ \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} %\centering %\vspace*{-15mm} -Each row of the image is shifted +To produce {\bf slant}, each row of the image is shifted proportionally to its height: $shift = round(slant \times height)$. $slant \sim U[-complexity,complexity]$. \vspace{1.5cm} @@ -201,12 +202,12 @@ \vspace*{-6mm} \begin{center} \includegraphics[scale=.4]{images/Affine_only.png}\\ -{\bf Affine} +{\bf Affine Transformation} \end{center} \end{wrapfigure} %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -A $2 \times 3$ affine transform matrix (with +A $2 \times 3$ {\bf affine transform} matrix (with parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. Output pixel $(x,y)$ takes the value of input pixel nearest to $(ax+by+c,dx+ey+f)$, @@ -234,8 +235,8 @@ %\end{minipage}% %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth} %\vspace*{-20mm} -This local elastic deformation -filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, +The {\bf local elastic} deformation +module induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, which provides more details. The intensity of the displacement fields is given by $\alpha = \sqrt[3]{complexity} \times 10.0$, which are @@ -258,7 +259,7 @@ %\vspace{.6cm} %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0. +The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). For a square input image, draw a radius-$r$ disk @@ -267,7 +268,7 @@ the value of a ``source'' pixel in the original image, on the line that goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times -d_1$, where $pinch$ is a parameter to the filter. +d_1$, where $pinch$ is a parameter of the filter. The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. Here $pinch \sim U[-complexity, 0.7 \times complexity]$. @@ -289,8 +290,8 @@ \end{minipage}% \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} %\vspace*{.5mm} -This is GIMP's ``linear motion blur'' -with parameters $length$ and $angle$. The value of +The {\bf motion blur} module is GIMP's ``linear motion blur'', which +has parameters $length$ and $angle$. The value of a pixel in the final image is approximately the mean of the first $length$ pixels found by moving in the $angle$ direction, $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. @@ -307,14 +308,14 @@ \end{minipage}% \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} \vspace*{-18mm} -Selects a random rectangle from an {\em occluder} character +The {\bf occlusion} module selects a random rectangle from an {\em occluder} character image and places it over the original {\em occluded} image. Pixels are combined by taking the max(occluder,occluded), closer to black. The rectangle corners are sampled so that larger complexity gives larger rectangles. The destination position in the occluded image are also sampled according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). -This filter is skipped with probability 60\%. +This module is skipped with probability 60\%. %\vspace{7mm} \end{minipage} @@ -332,7 +333,8 @@ %\vspace{.5cm} %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} -Different regions of the image are spatially smoothed by convolving +With the {\bf Gaussian smoothing} module, +different regions of the image are spatially smoothed by convolving the image with a symmetric Gaussian kernel of size and variance chosen uniformly in the ranges $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized @@ -344,7 +346,7 @@ we add to the mask the averaging window centered to it. The final image is computed from the following element-wise operation: $\frac{image + filtered image \times mask}{mask+1}$. -This filter is skipped with probability 75\%. +This module is skipped with probability 75\%. %\end{minipage} \newpage @@ -364,14 +366,14 @@ %\end{minipage}% %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} %\vspace*{-20mm} -This filter permutes neighbouring pixels. It first selects +This module {\bf permutes neighbouring pixels}. It first selects fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then sequentially exchanged with one other in as $V4$ neighbourhood. -This filter is skipped with probability 80\%.\\ +This module is skipped with probability 80\%.\\ \vspace*{1mm} \end{minipage} -\vspace{-1mm} +\vspace{-3mm} \begin{minipage}[t]{\linewidth} \begin{wrapfigure}[7]{l}{0.15\textwidth} @@ -387,13 +389,13 @@ %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} \vspace*{12mm} -This filter simply adds, to each pixel of the image independently, a +The {\bf Gaussian noise} module simply adds, to each pixel of the image independently, a noise $\sim Normal(0,(\frac{complexity}{10})^2)$. -This filter is skipped with probability 70\%. +This module is skipped with probability 70\%. %\vspace{1.1cm} \end{minipage} -\vspace*{1.5cm} +\vspace*{1.2cm} \begin{minipage}[t]{\linewidth} \begin{minipage}[t]{0.14\linewidth} @@ -403,7 +405,7 @@ \end{minipage}% \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} \vspace*{-18mm} -Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random +Following~\citet{Larochelle-jmlr-2009}, the {\bf background image} module adds a random background image behind the letter, from a randomly chosen natural image, with contrast adjustments depending on $complexity$, to preserve more or less of the original character image. @@ -419,9 +421,9 @@ \end{minipage}% \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} \vspace*{-18mm} -This filter adds noise $\sim U[0,1]$ to random subsets of pixels. +The {\bf salt and pepper noise} module adds noise $\sim U[0,1]$ to random subsets of pixels. The number of selected pixels is $0.2 \times complexity$. -This filter is skipped with probability 75\%. +This module is skipped with probability 75\%. %\vspace{.9cm} \end{minipage} %\vspace{-.7cm} @@ -441,7 +443,7 @@ \end{wrapfigure} %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} %\vspace{.4cm} -The scratches module places line-like white patches on the image. The +The {\bf scratches} module places line-like white patches on the image. The lines are heavily transformed images of the digit ``1'' (one), chosen at random among 500 such 1 images, randomly cropped and rotated by an angle $\sim Normal(0,(100 \times @@ -449,20 +451,20 @@ Two passes of a grey-scale morphological erosion filter are applied, reducing the width of the line by an amount controlled by $complexity$. -This filter is skipped with probability 85\%. The probabilities +This module is skipped with probability 85\%. The probabilities of applying 1, 2, or 3 patches are (50\%,30\%,20\%). \end{minipage} \vspace*{2mm} -\begin{minipage}[t]{0.20\linewidth} +\begin{minipage}[t]{0.25\linewidth} \centering -\hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ -{\bf Grey \& Contrast} +\hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ +{\bf Grey Level \& Contrast} \end{minipage}% -\hspace{-4mm}\begin{minipage}[t]{0.82\linewidth} -\vspace*{-18mm} -This filter changes the contrast by changing grey levels, and may invert the image polarity (white +\hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} +t -m "\vspace*{-18mm} +The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity is inverted with probability 50\%. @@ -710,6 +712,21 @@ \end{figure} +\begin{figure}[ht] +\vspace*{-3mm} +\centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} +\vspace*{-3mm} +\caption{Relative improvement in error rate due to self-taught learning. +Left: Improvement (or loss, when negative) +induced by out-of-distribution examples (perturbed data). +Right: Improvement (or loss, when negative) induced by multi-task +learning (training on all classes and testing only on either digits, +upper case, or lower-case). The deep learner (SDA) benefits more from +both self-taught learning scenarios, compared to the shallow MLP.} +\label{fig:improvements-charts} +\vspace*{-2mm} +\end{figure} + \section{Experimental Results} \vspace*{-2mm} @@ -739,21 +756,6 @@ confusions (e.g. a vertical bar can be a ``1'', an ``l'' or an ``L'', and a ``c'' and a ``C'' are often indistinguishible). -\begin{figure}[ht] -\vspace*{-3mm} -\centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} -\vspace*{-3mm} -\caption{Relative improvement in error rate due to self-taught learning. -Left: Improvement (or loss, when negative) -induced by out-of-distribution examples (perturbed data). -Right: Improvement (or loss, when negative) induced by multi-task -learning (training on all classes and testing only on either digits, -upper case, or lower-case). The deep learner (SDA) benefits more from -both self-taught learning scenarios, compared to the shallow MLP.} -\label{fig:improvements-charts} -\vspace*{-2mm} -\end{figure} - In addition, as shown in the left of Figure~\ref{fig:improvements-charts}, the relative improvement in error rate brought by self-taught learning is greater for the SDA, and these