ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 501:5927432d8b8d

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Tue, 01 Jun 2010 12:28:05 -0400
parents	8479bf822d0e
children	2b35a6e5ece4 e837ef6eef8c

comparison

equal deleted inserted replaced

-:8479bf822d0e
+:5927432d8b8d
 different representation of the raw visual input. In fact,
 it was found recently that the features learnt in deep architectures resemble
 those observed in the first two of these stages (in areas V1 and V2
 of visual cortex)~\citep{HonglakL2008}, and that they become more and
 more invariant to factors of variation (such as camera movement) in
-higher layers~\cite{Goodfellow2009}.
+higher layers~\citep{Goodfellow2009}.
 Learning a hierarchy of features increases the
 ease and practicality of developing representations that are at once
 tailored to specific tasks, yet are able to borrow statistical strength
 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the
 feature representation can lead to higher-level (more abstract, more
 amount of deformation or noise introduced.
 There are two main parts in the pipeline. The first one,
 from slant to pinch below, performs transformations. The second
 part, from blur to contrast, adds different kinds of noise.
+\begin{figure}[h]
+\resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
+\caption{Illustration of each transformation applied alone to the same image
+of an upper-case h (top left). First row (from left to right) : original image, slant,
+thickness, affine transformation (translation, rotation, shear),
+local elastic deformation; second row (from left to right) :
+pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
+background image, salt and pepper noise, spatially Gaussian noise, scratches,
+grey level and contrast changes.}
+\label{fig:transfo}
+\end{figure}
 {\large\bf Transformations}
 \vspace*{2mm}
 (bottom right) is used as training example.}
 \label{fig:pipeline}
 \end{figure}
 \fi
-\begin{figure}[h]
-\resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
-\caption{Illustration of each transformation applied alone to the same image
-of an upper-case h (top left). First row (from left to right) : original image, slant,
-thickness, affine transformation (translation, rotation, shear),
-local elastic deformation; second row (from left to right) :
-pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
-background image, salt and pepper noise, spatially Gaussian noise, scratches,
-grey level and contrast changes.}
-\label{fig:transfo}
-\end{figure}
 \vspace*{-1mm}
 \section{Experimental Setup}
 \vspace*{-1mm}
 Whereas much previous work on deep learning algorithms had been performed on
 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
 with 60~000 examples, and variants involving 10~000
-examples~\cite{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
+examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
 to focus here on the case of much larger training sets, from 10 times to
 to 1000 times larger.  The larger datasets are obtained by first sampling from
 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
 and {\bf OCR data} (scanned machine printed characters). Once a character
 is sampled from one of these sources (chosen randomly), a pipeline of
 \vspace*{-1mm}
 %\begin{itemize}
 %\item
 {\bf NIST.}
-Our main source of characters is the NIST Special Database 19~\cite{Grother-1995},
+Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
 widely used for training and testing character
-recognition systems~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}.
+recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}.
-The dataset is composed with 8????? digits and characters (upper and lower cases), with hand checked classifications,
+The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications,
 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended
-by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}
+by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}
 for that purpose. We randomly split the remainder into a training set and a validation set for
 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
 and 82587 for testing.
 The performances reported by previous work on that dataset mostly use only the digits.
 Here we use all the classes both in the training and testing phase. This is especially
 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label
 from one of the 62 character classes.
 %\begin{itemize}
 %\item
-{\bf NIST.} This is the raw NIST special database 19.
+{\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}.
 %\item
 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources
 and sending them through the above transformation pipeline.
 For each new example to generate, a source is selected with probability $10\%$ from the fonts,

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 501:5927432d8b8d