ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 560:dc5c3f538a05

Small fixes (typos / precisions)

author	Olivier Delalleau <delallea@iro>
date	Thu, 03 Jun 2010 11:02:39 -0400
parents	cf5a7ee2d892
children	b9b811e886ae

comparison

equal deleted inserted replaced

-:cf5a7ee2d892
+:dc5c3f538a05
 converted into a deep supervised feedforward neural network and fine-tuned by
 stochastic gradient descent.
 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles
 of semi-supervised and multi-task learning: the learner can exploit examples
-that are unlabeled and/or come from a distribution different from the target
+that are unlabeled and possibly come from a distribution different from the target
 distribution, e.g., from other classes than those of interest.
 It has already been shown that deep learners can clearly take advantage of
 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small},
 but more needs to be done to explore the impact
 of {\em out-of-distribution} examples and of the multi-task setting
 \end{wrapfigure}
 %\vspace{0.7cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
 This section describes the different transformations we used to stochastically
-transform source images such as the one on the left
+transform $32 \times 32$ source images (such as the one on the left)
 in order to obtain data from a larger distribution which
 covers a domain substantially larger than the clean characters distribution from
 which we start.
 Although character transformations have been used before to
 improve character recognizers, this effort is on a large scale both
 increasing dimensions (largest is $5\times5$) were used.  For each image,
 randomly sample the operator type (dilation or erosion) with equal probability and one structural
 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
 A neutral element (no transformation)
-is always present in the set. is applied.
+is always present in the set.
 %\vspace{.4cm}
 %\end{minipage}
 %\vspace{-.7cm}
 \begin{minipage}[b]{0.14\linewidth}
 \centering
 \includegraphics[scale=.4]{images/Slant_only.png}\\
 {\bf Slant}
 \end{minipage}%
-\hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth}
+\hspace{0.3cm}
+\begin{minipage}[b]{0.83\linewidth}
 %\centering
-%\vspace*{-15mm}
 To produce {\bf slant}, each row of the image is shifted
 proportionally to its height: $shift = round(slant \times height)$.
 $slant \sim U[-complexity,complexity]$.
-\vspace{1.5cm}
+The shift is randomly chosen to be either to the left or to the right.
+\vspace{1.1cm}
 \end{minipage}
 %\vspace*{-4mm}
 %\begin{minipage}[b]{0.14\linewidth}
 %\centering
 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$.
 Output pixel $(x,y)$ takes the value of input pixel
 nearest to $(ax+by+c,dx+ey+f)$,
 producing scaling, translation, rotation and shearing.
 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to
-forbid large rotations (not to confuse classes) but to give good
+forbid large rotations (to avoid confusing classes) but to give good
 variability of the transformation: $a$ and $d$ $\sim U[1-3
-complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\,
+complexity,1+3\,complexity]$, $b$ and $e$ $\sim U[-3 \,complexity,3\,
-complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \,
+complexity]$, and $c$ and $f \sim U[-4 \,complexity, 4 \,
 complexity]$.\\
 %\end{minipage}
 \vspace*{-4.5mm}
 \end{center}
 \end{wrapfigure}
 %\vspace{.6cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
-The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0.
+The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl set to 0.
 A pinch is ``similar to projecting the image onto an elastic
 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
 For a square input image, draw a radius-$r$ disk
-around $C$. Any pixel $P$ belonging to
+around its center $C$. Any pixel $P$ belonging to
 that disk has its value replaced by
 the value of a ``source'' pixel in the original image,
 on the line that goes through $C$ and $P$, but
-at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
+at some other distance $d_2$. Define $d_1=distance(P,C)$
+and $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
 d_1$, where $pinch$ is a parameter of the filter.
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
 %\vspace{1.5cm}
 \end{minipage}%
 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
 \vspace*{-18mm}
 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character
 image and places it over the original {\em occluded}
-image. Pixels are combined by taking the max(occluder,occluded),
+image. Pixels are combined by taking the max(occluder, occluded),
-closer to black. The rectangle corners
+i.e. keeping the lighter ones.
+The rectangle corners
 are sampled so that larger complexity gives larger rectangles.
 The destination position in the occluded image are also sampled
 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
 This module is skipped with probability 60\%.
 %\vspace{7mm}
 \end{wrapfigure}
 %\vspace{.5cm}
 %\end{minipage}%
 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
 With the {\bf Gaussian smoothing} module,
-different regions of the image are spatially smoothed by convolving
+different regions of the image are spatially smoothed.
-the image with a symmetric Gaussian kernel of
+This is achieved  by first convolving
+the image with an isotropic Gaussian kernel of
 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
-complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
+complexity]$ and $[2,2 + 6 \times complexity]$. This filtered image is normalized
-between $0$ and $1$.  We also create a symmetric weighted averaging window, of the
+between $0$ and $1$.  We also create an isotropic weighted averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be
 averaging centers between the original image and the filtered one.  We
 initialize to zero a mask matrix of the image size. For each selected pixel
-we add to the mask the averaging window centered to it.  The final image is
+we add to the mask the averaging window centered on it.  The final image is
-computed from the following element-wise operation: $\frac{image + filtered
+computed from the following element-wise operation: $\frac{image + filtered\_image
-image \times mask}{mask+1}$.
+\times mask}{mask+1}$.
 This module is skipped with probability 75\%.
 %\end{minipage}
 \newpage
 \end{center}
 \end{wrapfigure}
 %\end{minipage}%
 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth}
 %\vspace*{-20mm}
-This module {\bf permutes neighbouring pixels}. It first selects
+This module {\bf permutes neighbouring pixels}. It first selects a
-fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
+fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each
-sequentially exchanged with one other in as $V4$ neighbourhood.
+of these pixels is then sequentially exchanged with a random pixel
+among its four nearest neighbors (on its left, right, top or bottom).
 This module is skipped with probability 80\%.\\
 \vspace*{1mm}
 \end{minipage}
 \vspace{-3mm}
 by an amount controlled by $complexity$.
 This module is skipped with probability 85\%. The probabilities
 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
 \end{minipage}
-\vspace*{2mm}
+\vspace*{1mm}
 \begin{minipage}[t]{0.25\linewidth}
 \centering
 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\
 {\bf Grey Level \& Contrast}
 \end{minipage}%
 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth}
-t -m "\vspace*{-18mm}
+\vspace*{-18mm}
 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white
 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 polarity is inverted with probability 50\%.
 %\vspace{.7cm}
 (bottom right) is used as training example.}
 \label{fig:pipeline}
 \end{figure}
 \fi
+\vspace*{-3mm}
-\vspace*{-2mm}
 \section{Experimental Setup}
 \vspace*{-1mm}
 Much previous work on deep learning had been performed on
 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 560:dc5c3f538a05