Mercurial > ift6266
diff writeup/nips2010_submission.tex @ 495:5764a2ae1fb5
typos
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 01 Jun 2010 11:02:10 -0400 |
parents | a194ce5a4249 |
children | e41007dd40e9 2b58eda9fc08 |
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex Tue Jun 01 07:56:00 2010 -0400 +++ b/writeup/nips2010_submission.tex Tue Jun 01 11:02:10 2010 -0400 @@ -20,7 +20,7 @@ Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple - non-linear transformations. The self-taught learning (exploitng unlabeled + non-linear transformations. The self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution @@ -139,14 +139,14 @@ {\bf Slant.} We mimic slant by shifting each row of the image -proportionnaly to its height: $shift = round(slant \times height)$. +proportionally to its height: $shift = round(slant \times height)$. The $slant$ coefficient can be negative or positive with equal probability and its value is randomly sampled according to the complexity level: e $slant \sim U[0,complexity]$, so the maximum displacement for the lowest or highest pixel line is of $round(complexity \times 32)$.\\ {\bf Thickness.} -Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82} +Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} are applied. The neighborhood of each pixel is multiplied element-wise with a {\em structuring element} matrix. The pixel value is replaced by the maximum or the minimum of the resulting @@ -192,7 +192,7 @@ radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to that disk (region inside circle) will have its value recalculated by taking the value of another "source" pixel in the original image. The position of -that source pixel is found on the line thats goes through $C$ and $P$, but +that source pixel is found on the line that goes through $C$ and $P$, but at some other distance $d_2$. Define $d_1$ to be the distance between $P$ and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times d_1$, where $pinch$ is a parameter to the filter. @@ -235,7 +235,7 @@ {\bf Background Images.} Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random background behind the letter. The background is chosen by first selecting, -at random, an image from a set of images. Then a 32$\times$32 subregion +at random, an image from a set of images. Then a 32$\times$32 sub-region of that image is chosen as the background image (by sampling position uniformly while making sure not to cross image borders). To combine the original letter image and the background image, contrast @@ -252,7 +252,7 @@ {\bf Spatially Gaussian Noise.} Different regions of the image are spatially smoothed. The image is convolved with a symmetric Gaussian kernel of -size and variance choosen uniformly in the ranges $[12,12 + 20 \times +size and variance chosen uniformly in the ranges $[12,12 + 20 \times complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized between $0$ and $1$. We also create a symmetric averaging window, of the kernel size, with maximum value at the center. For each image we sample @@ -268,8 +268,8 @@ lines are heavily transformed images of the digit "1" (one), chosen at random among five thousands such 1 images. The 1 image is randomly cropped and rotated by an angle $\sim Normal(0,(100 \times -complexity)^2$, using bicubic interpolation, -Two passes of a greyscale morphological erosion filter +complexity)^2$, using bi-cubic interpolation, +Two passes of a grey-scale morphological erosion filter are applied, reducing the width of the line by an amount controlled by $complexity$. This filter is only applied only 15\% of the time. When it is applied, 50\% @@ -278,10 +278,10 @@ generated. The patch is applied by taking the maximal value on any given patch or the original image, for each of the 32x32 pixel locations.\\ {\bf Color and Contrast Changes.} -This filter changes the constrast and may invert the image polarity (white +This filter changes the contrast and may invert the image polarity (white on black to black on white). The contrast $C$ is defined here as the difference between the maximum and the minimum pixel value of the image. -Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$). +Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$). The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The polarity is inverted with $0.5$ probability. @@ -301,9 +301,9 @@ \begin{figure}[h] \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\ \caption{Illustration of each transformation applied alone to the same image -of an upper-case h (top left). First row (from left to rigth) : original image, slant, -thickness, affine transformation, local elastic deformation; second row (from left to rigth) : -pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to rigth) : +of an upper-case h (top left). First row (from left to right) : original image, slant, +thickness, affine transformation, local elastic deformation; second row (from left to right) : +pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : background image, salt and pepper noise, spatially Gaussian noise, scratches, color and contrast changes.} \label{fig:transfo} @@ -355,16 +355,16 @@ In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net} %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly. -The ttf file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, +The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, directly as input to our models. %\item {\bf Captchas.} The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for generating characters of the same format as the NIST dataset. This software is based on -a random character class generator and various kinds of tranformations similar to those described in the previous sections. +a random character class generator and various kinds of transformations similar to those described in the previous sections. In order to increase the variability of the data generated, many different fonts are used for generating the characters. -Transformations (slant, distorsions, rotation, translation) are applied to each randomly generated character with a complexity +Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are allowed and can be controlled via an easy to use facade class. @@ -374,7 +374,7 @@ characters (from various documents and books) where included as an additional source. This set is part of a larger corpus being collected by the Image Understanding Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern -({\tt http://www.iupr.com}), and which will be publically released. +({\tt http://www.iupr.com}), and which will be publicly released. %\end{itemize} \vspace*{-1mm} @@ -391,7 +391,7 @@ %\item {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources and sending them through the above transformation pipeline. -For each new exemple to generate, a source is selected with probability $10\%$ from the fonts, +For each new example to generate, a source is selected with probability $10\%$ from the fonts, $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the order given above, and for each of them we sample uniformly a complexity in the range $[0,0.7]$. @@ -399,7 +399,7 @@ {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same sources proportion) except that we only apply transformations from slant to pinch. Therefore, the character is - transformed but no additionnal noise is added to the image, giving images + transformed but no additional noise is added to the image, giving images closer to the NIST dataset. %\end{itemize} @@ -475,7 +475,7 @@ %processing \citep{SnowEtAl2008} and vision %\citep{SorokinAndForsyth2008,whitehill09}. AMT users where presented -with 10 character images and asked to type 10 corresponding ascii +with 10 character images and asked to type 10 corresponding ASCII characters. They were forced to make a hard choice among the 62 or 10 character classes (all classes or digits only). Three users classified each image, allowing @@ -555,7 +555,7 @@ Our results show that the MLP benefits marginally from the multi-task setting in the case of digits (5\% relative improvement) but is actually hurt in the case of characters (respectively 3\% and 4\% worse for lower and upper class characters). -On the other hand the SDA benefitted from the multi-task setting, with relative +On the other hand the SDA benefited from the multi-task setting, with relative error rate improvements of 27\%, 15\% and 13\% respectively for digits, lower and upper case characters, as shown in Table~\ref{tab:multi-task}. \fi @@ -595,7 +595,7 @@ the {\em original clean examples}? Do deep architectures benefit more from such {\em out-of-distribution} examples, i.e. do they benefit more from the self-taught learning~\citep{RainaR2007} framework? MLPs were helped by perturbed training examples when tested on perturbed input images, -but only marginally helped wrt clean examples. On the other hand, the deep SDAs +but only marginally helped with respect to clean examples. On the other hand, the deep SDAs were very significantly boosted by these out-of-distribution examples. $\bullet$ %\item