Mercurial > ift6266

--- a/writeup/nips2010_submission.tex	Tue Jun 01 07:56:00 2010 -0400
+++ b/writeup/nips2010_submission.tex	Tue Jun 01 11:02:10 2010 -0400
@@ -20,7 +20,7 @@
   Recent theoretical and empirical work in statistical machine learning has
   demonstrated the importance of learning algorithms for deep
   architectures, i.e., function classes obtained by composing multiple
-  non-linear transformations. The self-taught learning (exploitng unlabeled
+  non-linear transformations. The self-taught learning (exploiting unlabeled
   examples or examples from other distributions) has already been applied
   to deep learners, but mostly to show the advantage of unlabeled
   examples. Here we explore the advantage brought by {\em out-of-distribution
@@ -139,14 +139,14 @@

 {\bf Slant.}
 We mimic slant by shifting each row of the image
-proportionnaly to its height: $shift = round(slant \times height)$.
+proportionally to its height: $shift = round(slant \times height)$.
 The $slant$ coefficient can be negative or positive with equal probability
 and its value is randomly sampled according to the complexity level:
 e $slant \sim U[0,complexity]$, so the
 maximum displacement for the lowest or highest pixel line is of
 $round(complexity \times 32)$.\\
 {\bf Thickness.}
-Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82}
+Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
 are applied. The neighborhood of each pixel is multiplied
 element-wise with a {\em structuring element} matrix.
 The pixel value is replaced by the maximum or the minimum of the resulting
@@ -192,7 +192,7 @@
 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
 that disk (region inside circle) will have its value recalculated by taking
 the value of another "source" pixel in the original image. The position of
-that source pixel is found on the line thats goes through $C$ and $P$, but
+that source pixel is found on the line that goes through $C$ and $P$, but
 at some other distance $d_2$. Define $d_1$ to be the distance between $P$
 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
 d_1$, where $pinch$ is a parameter to the filter.
@@ -235,7 +235,7 @@
 {\bf Background Images.}
 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
 background behind the letter. The background is chosen by first selecting,
-at random, an image from a set of images. Then a 32$\times$32 subregion
+at random, an image from a set of images. Then a 32$\times$32 sub-region
 of that image is chosen as the background image (by sampling position
 uniformly while making sure not to cross image borders).
 To combine the original letter image and the background image, contrast
@@ -252,7 +252,7 @@
 {\bf Spatially Gaussian Noise.}
 Different regions of the image are spatially smoothed.
 The image is convolved with a symmetric Gaussian kernel of
-size and variance choosen uniformly in the ranges $[12,12 + 20 \times
+size and variance chosen uniformly in the ranges $[12,12 + 20 \times
 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
 between $0$ and $1$.  We also create a symmetric averaging window, of the
 kernel size, with maximum value at the center.  For each image we sample
@@ -268,8 +268,8 @@
 lines are heavily transformed images of the digit "1" (one), chosen
 at random among five thousands such 1 images. The 1 image is
 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
-complexity)^2$, using bicubic interpolation,
-Two passes of a greyscale morphological erosion filter
+complexity)^2$, using bi-cubic interpolation,
+Two passes of a grey-scale morphological erosion filter
 are applied, reducing the width of the line
 by an amount controlled by $complexity$.
 This filter is only applied only 15\% of the time. When it is applied, 50\%
@@ -278,10 +278,10 @@
 generated. The patch is applied by taking the maximal value on any given
 patch or the original image, for each of the 32x32 pixel locations.\\
 {\bf Color and Contrast Changes.}
-This filter changes the constrast and may invert the image polarity (white
+This filter changes the contrast and may invert the image polarity (white
 on black to black on white). The contrast $C$ is defined here as the
 difference between the maximum and the minimum pixel value of the image.
-Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$).
+Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$).
 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
 polarity is inverted with $0.5$ probability.

@@ -301,9 +301,9 @@
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
 \caption{Illustration of each transformation applied alone to the same image
-of an upper-case h (top left). First row (from left to rigth) : original image, slant,
-thickness, affine transformation, local elastic deformation; second row (from left to rigth) :
-pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to rigth) :
+of an upper-case h (top left). First row (from left to right) : original image, slant,
+thickness, affine transformation, local elastic deformation; second row (from left to right) :
+pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
 background image, salt and pepper noise, spatially Gaussian noise, scratches,
 color and contrast changes.}
 \label{fig:transfo}
@@ -355,16 +355,16 @@
 In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net}
 %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html}
 in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly.
-The ttf file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image,
+The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image,
 directly as input to our models.

 %\item
 {\bf Captchas.}
 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for
 generating characters of the same format as the NIST dataset. This software is based on
-a random character class generator and various kinds of tranformations similar to those described in the previous sections.
+a random character class generator and various kinds of transformations similar to those described in the previous sections.
 In order to increase the variability of the data generated, many different fonts are used for generating the characters.
-Transformations (slant, distorsions, rotation, translation) are applied to each randomly generated character with a complexity
+Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity
 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are
 allowed and can be controlled via an easy to use facade class.

@@ -374,7 +374,7 @@
 characters (from various documents and books) where included as an
 additional source. This set is part of a larger corpus being collected by the Image Understanding
 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern
-({\tt http://www.iupr.com}), and which will be publically released.
+({\tt http://www.iupr.com}), and which will be publicly released.
 %\end{itemize}

 \vspace*{-1mm}
@@ -391,7 +391,7 @@
 %\item
 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources
 and sending them through the above transformation pipeline.
-For each new exemple to generate, a source is selected with probability $10\%$ from the fonts,
+For each new example to generate, a source is selected with probability $10\%$ from the fonts,
 $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the
 order given above, and for each of them we sample uniformly a complexity in the range $[0,0.7]$.

@@ -399,7 +399,7 @@
 {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same sources proportion)
   except that we only apply
   transformations from slant to pinch. Therefore, the character is
-  transformed but no additionnal noise is added to the image, giving images
+  transformed but no additional noise is added to the image, giving images
   closer to the NIST dataset.
 %\end{itemize}

@@ -475,7 +475,7 @@
 %processing \citep{SnowEtAl2008} and vision
 %\citep{SorokinAndForsyth2008,whitehill09}.
 AMT users where presented
-with 10 character images and asked to type 10 corresponding ascii
+with 10 character images and asked to type 10 corresponding ASCII
 characters. They were forced to make a hard choice among the
 62 or 10 character classes (all classes or digits only).
 Three users classified each image, allowing
@@ -555,7 +555,7 @@
 Our results show that the MLP benefits marginally from the multi-task setting
 in the case of digits (5\% relative improvement) but is actually hurt in the case
 of characters (respectively 3\% and 4\% worse for lower and upper class characters).
-On the other hand the SDA benefitted from the multi-task setting, with relative
+On the other hand the SDA benefited from the multi-task setting, with relative
 error rate improvements of 27\%, 15\% and 13\% respectively for digits,
 lower and upper case characters, as shown in Table~\ref{tab:multi-task}.
 \fi
@@ -595,7 +595,7 @@
 the {\em original clean examples}? Do deep architectures benefit more from such {\em out-of-distribution}
 examples, i.e. do they benefit more from the self-taught learning~\citep{RainaR2007} framework?
 MLPs were helped by perturbed training examples when tested on perturbed input images,
-but only marginally helped wrt clean examples. On the other hand, the deep SDAs
+but only marginally helped with respect to clean examples. On the other hand, the deep SDAs
 were very significantly boosted by these out-of-distribution examples.

 $\bullet$ %\item