ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 495:5764a2ae1fb5

typos

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Tue, 01 Jun 2010 11:02:10 -0400
parents	a194ce5a4249
children	e41007dd40e9 2b58eda9fc08

comparison

equal deleted inserted replaced

-:405cabc08c92
+:5764a2ae1fb5
 \vspace*{-2mm}
 \begin{abstract}
 Recent theoretical and empirical work in statistical machine learning has
 demonstrated the importance of learning algorithms for deep
 architectures, i.e., function classes obtained by composing multiple
-non-linear transformations. The self-taught learning (exploitng unlabeled
+non-linear transformations. The self-taught learning (exploiting unlabeled
 examples or examples from other distributions) has already been applied
 to deep learners, but mostly to show the advantage of unlabeled
 examples. Here we explore the advantage brought by {\em out-of-distribution
 examples} and show that {\em deep learners benefit more from them than a
 corresponding shallow learner}, in the area
 \vspace*{2mm}
 {\bf Slant.}
 We mimic slant by shifting each row of the image
-proportionnaly to its height: $shift = round(slant \times height)$.
+proportionally to its height: $shift = round(slant \times height)$.
 The $slant$ coefficient can be negative or positive with equal probability
 and its value is randomly sampled according to the complexity level:
 e $slant \sim U[0,complexity]$, so the
 maximum displacement for the lowest or highest pixel line is of
 $round(complexity \times 32)$.\\
 {\bf Thickness.}
-Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82}
+Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
 matrix, respectively for dilation or erosion. Ten different structural elements with
 increasing dimensions (largest is $5\times5$) were used.  For each image,
 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}.
 For a square input image, think of drawing a circle of
 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
 that disk (region inside circle) will have its value recalculated by taking
 the value of another "source" pixel in the original image. The position of
-that source pixel is found on the line thats goes through $C$ and $P$, but
+that source pixel is found on the line that goes through $C$ and $P$, but
 at some other distance $d_2$. Define $d_1$ to be the distance between $P$
 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
 d_1$, where $pinch$ is a parameter to the filter.
 The actual value is given by bilinear interpolation considering the pixels
 around the (non-integer) source position thus found.
 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
 It has has a probability of not being applied at all of 70\%.\\
 {\bf Background Images.}
 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
 background behind the letter. The background is chosen by first selecting,
-at random, an image from a set of images. Then a 32$\times$32 subregion
+at random, an image from a set of images. Then a 32$\times$32 sub-region
 of that image is chosen as the background image (by sampling position
 uniformly while making sure not to cross image borders).
 To combine the original letter image and the background image, contrast
 adjustments are made. We first get the maximal values (i.e. maximal
 intensity) for both the original image and the background image, $maximage$
 The number of selected pixels is $0.2 \times complexity$.
 This filter has a probability of not being applied at all of 75\%.\\
 {\bf Spatially Gaussian Noise.}
 Different regions of the image are spatially smoothed.
 The image is convolved with a symmetric Gaussian kernel of
-size and variance choosen uniformly in the ranges $[12,12 + 20 \times
+size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
 between $0$ and $1$.  We also create a symmetric averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be
 averaging centers between the original image and the filtered one.  We
 {\bf Scratches.}
 The scratches module places line-like white patches on the image.  The
 lines are heavily transformed images of the digit "1" (one), chosen
 at random among five thousands such 1 images. The 1 image is
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
-complexity)^2$, using bicubic interpolation,
+complexity)^2$, using bi-cubic interpolation,
-Two passes of a greyscale morphological erosion filter
+Two passes of a grey-scale morphological erosion filter
 are applied, reducing the width of the line
 by an amount controlled by $complexity$.
 This filter is only applied only 15\% of the time. When it is applied, 50\%
 of the time, only one patch image is generated and applied. In 30\% of
 cases, two patches are generated, and otherwise three patches are
 generated. The patch is applied by taking the maximal value on any given
 patch or the original image, for each of the 32x32 pixel locations.\\
 {\bf Color and Contrast Changes.}
-This filter changes the constrast and may invert the image polarity (white
+This filter changes the contrast and may invert the image polarity (white
 on black to black on white). The contrast $C$ is defined here as the
 difference between the maximum and the minimum pixel value of the image.
-Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$).
+Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$).
 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 polarity is inverted with $0.5$ probability.
 \begin{figure}[h]
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
 \caption{Illustration of each transformation applied alone to the same image
-of an upper-case h (top left). First row (from left to rigth) : original image, slant,
+of an upper-case h (top left). First row (from left to right) : original image, slant,
-thickness, affine transformation, local elastic deformation; second row (from left to rigth) :
+thickness, affine transformation, local elastic deformation; second row (from left to right) :
-pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to rigth) :
+pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
 background image, salt and pepper noise, spatially Gaussian noise, scratches,
 color and contrast changes.}
 \label{fig:transfo}
 \end{figure}
 %\item
 {\bf Fonts.}
 In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net}
 %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html}
 in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly.
-The ttf file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image,
+The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image,
 directly as input to our models.
 %\item
 {\bf Captchas.}
 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for
 generating characters of the same format as the NIST dataset. This software is based on
-a random character class generator and various kinds of tranformations similar to those described in the previous sections.
+a random character class generator and various kinds of transformations similar to those described in the previous sections.
 In order to increase the variability of the data generated, many different fonts are used for generating the characters.
-Transformations (slant, distorsions, rotation, translation) are applied to each randomly generated character with a complexity
+Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity
 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are
 allowed and can be controlled via an easy to use facade class.
 %\item
 {\bf OCR data.}
 A large set (2 million) of scanned, OCRed and manually verified machine-printed
 characters (from various documents and books) where included as an
 additional source. This set is part of a larger corpus being collected by the Image Understanding
 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern
-({\tt http://www.iupr.com}), and which will be publically released.
+({\tt http://www.iupr.com}), and which will be publicly released.
 %\end{itemize}
 \vspace*{-1mm}
 \subsection{Data Sets}
 \vspace*{-1mm}
 {\bf NIST.} This is the raw NIST special database 19.
 %\item
 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources
 and sending them through the above transformation pipeline.
-For each new exemple to generate, a source is selected with probability $10\%$ from the fonts,
+For each new example to generate, a source is selected with probability $10\%$ from the fonts,
 $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the
 order given above, and for each of them we sample uniformly a complexity in the range $[0,0.7]$.
 %\item
 {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same sources proportion)
 except that we only apply
 transformations from slant to pinch. Therefore, the character is
-transformed but no additionnal noise is added to the image, giving images
+transformed but no additional noise is added to the image, giving images
 closer to the NIST dataset.
 %\end{itemize}
 \vspace*{-1mm}
 \subsection{Models and their Hyperparameters}
 %of money to perform tasks for which human intelligence is required.
 %Mechanical Turk has been used extensively in natural language
 %processing \citep{SnowEtAl2008} and vision
 %\citep{SorokinAndForsyth2008,whitehill09}.
 AMT users where presented
-with 10 character images and asked to type 10 corresponding ascii
+with 10 character images and asked to type 10 corresponding ASCII
 characters. They were forced to make a hard choice among the
 62 or 10 character classes (all classes or digits only).
 Three users classified each image, allowing
 to estimate inter-human variability (shown as +/- in parenthesis below).
 fine-tuned on NIST.
 Our results show that the MLP benefits marginally from the multi-task setting
 in the case of digits (5\% relative improvement) but is actually hurt in the case
 of characters (respectively 3\% and 4\% worse for lower and upper class characters).
-On the other hand the SDA benefitted from the multi-task setting, with relative
+On the other hand the SDA benefited from the multi-task setting, with relative
 error rate improvements of 27\%, 15\% and 13\% respectively for digits,
 lower and upper case characters, as shown in Table~\ref{tab:multi-task}.
 \fi
 noise, affine transformations, background images) make the resulting
 classifier better not only on similarly perturbed images but also on
 the {\em original clean examples}? Do deep architectures benefit more from such {\em out-of-distribution}
 examples, i.e. do they benefit more from the self-taught learning~\citep{RainaR2007} framework?
 MLPs were helped by perturbed training examples when tested on perturbed input images,
-but only marginally helped wrt clean examples. On the other hand, the deep SDAs
+but only marginally helped with respect to clean examples. On the other hand, the deep SDAs
 were very significantly boosted by these out-of-distribution examples.
 $\bullet$ %\item
 Similarly, does the feature learning step in deep learning algorithms benefit more
 training with similar but different classes (i.e. a multi-task learning scenario) than

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 495:5764a2ae1fb5