# HG changeset patch # User Yoshua Bengio # Date 1275445535 14400 # Node ID caf7769ca19c0e71bb5b27e1f112e692213e3fb0 # Parent 4d6493d171f6ed246f104f3def1114ef600664b1 typo diff -r 4d6493d171f6 -r caf7769ca19c writeup/nips2010_submission.tex --- a/writeup/nips2010_submission.tex Tue Jun 01 22:12:13 2010 -0400 +++ b/writeup/nips2010_submission.tex Tue Jun 01 22:25:35 2010 -0400 @@ -357,15 +357,15 @@ to focus here on the case of much larger training sets, from 10 times to to 1000 times larger. -The first step in constructing the larger datasets is to sample from +The first step in constructing the larger datasets (called NISTP and P07) is to sample from a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, and {\bf OCR data} (scanned machine printed characters). Once a character is sampled from one of these sources (chosen randomly), the pipeline of the transformations and/or noise processes described in section \ref{s:perturbations} is applied to the image. -We compare the best MLP against -the best SDA (both models' hyper-parameters are selected to minimize the validation set error), +We compare the best MLPs against +the best SDAs (both models' hyper-parameters are selected to minimize the validation set error), along with a comparison against a precise estimate of human performance obtained via Amazon's Mechanical Turk (AMT) service (http://mturk.com). @@ -446,7 +446,7 @@ %\item {\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}. It has -\{651668 / 80000 / 82587\} \{training / validation / test} examples. +\{651668 / 80000 / 82587\} \{training / validation / test\} examples. %\item {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources @@ -454,7 +454,7 @@ For each new example to generate, a data source is selected with probability $10\%$ from the fonts, $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the order given above, and for each of them we sample uniformly a \emph{complexity} in the range $[0,0.7]$. -It has \{81920000 / 80000 / 20000\} \{training / validation / test} examples. +It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples. %\item {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same proportions of data sources) @@ -462,7 +462,7 @@ transformations from slant to pinch. Therefore, the character is transformed but no additional noise is added to the image, giving images closer to the NIST dataset. -It has \{81920000 / 80000 / 20000\} \{training / validation / test} examples. +It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples. %\end{itemize} \vspace*{-1mm} @@ -525,7 +525,8 @@ Auto-Encoder is presented with a stochastically corrupted version of the input and trained to reconstruct the uncorrupted input, forcing the hidden units to represent the leading regularities in -the data. Once it is trained, its hidden units activations can +the data. Once it is trained, in a purely unsupervised way, +its hidden units activations can be used as inputs for training a second one, etc. After this unsupervised pre-training stage, the parameters are used to initialize a deep MLP, which is fine-tuned by @@ -560,20 +561,24 @@ %\vspace*{-1mm} %\subsection{SDA vs MLP vs Humans} %\vspace*{-1mm} - +The models are either trained on NIST (MLP0 and SDA0), +NISTP (MLP1 and SDA1), or P07 (MLP2 and SDA2), and tested +on either NIST, NISTP or P07, either on all 62 classes +or only on the digits (considering only the outputs +associated with digit classes). Figure~\ref{fig:error-rates-charts} summarizes the results obtained, -comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, +comparing Humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1, SDA2), along with the previous results on the digits NIST special database 19 test set from the literature respectively based on ARTMAP neural networks ~\citep{Granger+al-2007}, fast nearest-neighbor search ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs ~\citep{Milgram+al-2005}. More detailed and complete numerical results (figures and tables, including standard errors on the error rates) can be -found in Appendix I of the supplementary material. The 3 kinds of model differ in the -training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 -(MLP2, SDA2). The deep learner not only outperformed the shallow ones and +found in Appendix I of the supplementary material. +The deep learner not only outperformed the shallow ones and previously published performance (in a statistically and qualitatively -significant way) but reaches human performance on both the 62-class task +significant way) but when trained with perturbed data +reaches human performance on both the 62-class task and the 10-class (digits) task. \begin{figure}[ht]