# HG changeset patch # User Olivier Delalleau # Date 1275577359 14400 # Node ID dc5c3f538a0578963f62a5b107b676a0568aa5b0 # Parent cf5a7ee2d89222c0f044f7681e93ff0b0dfd8fab Small fixes (typos / precisions) diff -r cf5a7ee2d892 -r dc5c3f538a05 writeup/nips2010_submission.tex --- a/writeup/nips2010_submission.tex Thu Jun 03 09:18:02 2010 -0400 +++ b/writeup/nips2010_submission.tex Thu Jun 03 11:02:39 2010 -0400 @@ -68,7 +68,7 @@ Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles of semi-supervised and multi-task learning: the learner can exploit examples -that are unlabeled and/or come from a distribution different from the target +that are unlabeled and possibly come from a distribution different from the target distribution, e.g., from other classes than those of interest. It has already been shown that deep learners can clearly take advantage of unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, @@ -129,7 +129,7 @@ %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} This section describes the different transformations we used to stochastically -transform source images such as the one on the left +transform $32 \times 32$ source images (such as the one on the left) in order to obtain data from a larger distribution which covers a domain substantially larger than the clean characters distribution from which we start. @@ -176,7 +176,7 @@ element from a subset of the $n=round(m \times complexity)$ smallest structuring elements where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). A neutral element (no transformation) -is always present in the set. is applied. +is always present in the set. %\vspace{.4cm} %\end{minipage} %\vspace{-.7cm} @@ -186,13 +186,14 @@ \includegraphics[scale=.4]{images/Slant_only.png}\\ {\bf Slant} \end{minipage}% -\hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} +\hspace{0.3cm} +\begin{minipage}[b]{0.83\linewidth} %\centering -%\vspace*{-15mm} To produce {\bf slant}, each row of the image is shifted proportionally to its height: $shift = round(slant \times height)$. $slant \sim U[-complexity,complexity]$. -\vspace{1.5cm} +The shift is randomly chosen to be either to the left or to the right. +\vspace{1.1cm} \end{minipage} %\vspace*{-4mm} @@ -213,10 +214,10 @@ nearest to $(ax+by+c,dx+ey+f)$, producing scaling, translation, rotation and shearing. Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to -forbid large rotations (not to confuse classes) but to give good +forbid large rotations (to avoid confusing classes) but to give good variability of the transformation: $a$ and $d$ $\sim U[1-3 -complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\, -complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \, +complexity,1+3\,complexity]$, $b$ and $e$ $\sim U[-3 \,complexity,3\, +complexity]$, and $c$ and $f \sim U[-4 \,complexity, 4 \, complexity]$.\\ %\end{minipage} @@ -259,15 +260,16 @@ %\vspace{.6cm} %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} -The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0. +The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl set to 0. A pinch is ``similar to projecting the image onto an elastic surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). For a square input image, draw a radius-$r$ disk -around $C$. Any pixel $P$ belonging to +around its center $C$. Any pixel $P$ belonging to that disk has its value replaced by the value of a ``source'' pixel in the original image, on the line that goes through $C$ and $P$, but -at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times +at some other distance $d_2$. Define $d_1=distance(P,C)$ +and $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter of the filter. The actual value is given by bilinear interpolation considering the pixels around the (non-integer) source position thus found. @@ -310,8 +312,9 @@ \vspace*{-18mm} The {\bf occlusion} module selects a random rectangle from an {\em occluder} character image and places it over the original {\em occluded} -image. Pixels are combined by taking the max(occluder,occluded), -closer to black. The rectangle corners +image. Pixels are combined by taking the max(occluder, occluded), +i.e. keeping the lighter ones. +The rectangle corners are sampled so that larger complexity gives larger rectangles. The destination position in the occluded image are also sampled according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). @@ -334,18 +337,19 @@ %\end{minipage}% %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} With the {\bf Gaussian smoothing} module, -different regions of the image are spatially smoothed by convolving -the image with a symmetric Gaussian kernel of +different regions of the image are spatially smoothed. +This is achieved by first convolving +the image with an isotropic Gaussian kernel of size and variance chosen uniformly in the ranges $[12,12 + 20 \times -complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized -between $0$ and $1$. We also create a symmetric weighted averaging window, of the +complexity]$ and $[2,2 + 6 \times complexity]$. This filtered image is normalized +between $0$ and $1$. We also create an isotropic weighted averaging window, of the kernel size, with maximum value at the center. For each image we sample uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be averaging centers between the original image and the filtered one. We initialize to zero a mask matrix of the image size. For each selected pixel -we add to the mask the averaging window centered to it. The final image is -computed from the following element-wise operation: $\frac{image + filtered - image \times mask}{mask+1}$. +we add to the mask the averaging window centered on it. The final image is +computed from the following element-wise operation: $\frac{image + filtered\_image +\times mask}{mask+1}$. This module is skipped with probability 75\%. %\end{minipage} @@ -366,9 +370,10 @@ %\end{minipage}% %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} %\vspace*{-20mm} -This module {\bf permutes neighbouring pixels}. It first selects -fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then -sequentially exchanged with one other in as $V4$ neighbourhood. +This module {\bf permutes neighbouring pixels}. It first selects a +fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each +of these pixels is then sequentially exchanged with a random pixel +among its four nearest neighbors (on its left, right, top or bottom). This module is skipped with probability 80\%.\\ \vspace*{1mm} \end{minipage} @@ -455,7 +460,7 @@ of applying 1, 2, or 3 patches are (50\%,30\%,20\%). \end{minipage} -\vspace*{2mm} +\vspace*{1mm} \begin{minipage}[t]{0.25\linewidth} \centering @@ -463,7 +468,7 @@ {\bf Grey Level \& Contrast} \end{minipage}% \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} -t -m "\vspace*{-18mm} +\vspace*{-18mm} The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The @@ -486,8 +491,7 @@ \end{figure} \fi - -\vspace*{-2mm} +\vspace*{-3mm} \section{Experimental Setup} \vspace*{-1mm}