# HG changeset patch
# User dumitru@dumitru.mtv.corp.google.com
# Date 1275358079 25200
# Node ID ee9836baade3f2862be60ff77b9ad31e6a7b12fd
# Parent  6c9ff48e15cdfc0186e2c6c0509868e37c34a974# Parent  21787ac4e5a06a3e24ba8d5addd9926201c89129
merge

diff -r 6c9ff48e15cd -r ee9836baade3 writeup/nips2010_submission.tex
--- a/writeup/nips2010_submission.tex	Mon May 31 19:07:35 2010 -0700
+++ b/writeup/nips2010_submission.tex	Mon May 31 19:07:59 2010 -0700
@@ -483,36 +483,36 @@
 
 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1,
-SDA2), along with the previous results on the digits NIST special database 19
-test set from the
-literature
-respectively based on ARTMAP neural networks
-~\citep{Granger+al-2007}, fast nearest-neighbor search
-~\citep{Cortes+al-2000}, MLPs
-~\citep{Oliveira+al-2002}, and SVMs
-~\citep{Milgram+al-2005}.
-More detailed and complete numerical results (figures and tables) 
-can be found in the appendix.  The 3 kinds of model differ in the
-training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1),
-or P07 (MLP2, SDA2). The deep learner not only outperformed
-the shallow ones and previously published performance 
-but reaches human performance on both the 62-class
-task and the 10-class (digits) task. In addition, as shown
-in the left of Figure~\ref{fig:fig:improvements-charts},
-the relative improvement in error rate brought by
-self-taught learning is greater for the SDA. The left
-side shows the improvement to the clean NIST test set error
-brought by the use of out-of-distribution
-examples (i.e. the perturbed examples examples from NISTP
-or P07). The right side of Figure~\ref{fig:fig:improvements-charts}
-shows the relative improvement brought by the use
-of a multi-task setting, in which the same model is trained
-for more classes than the target classes of interest
-(i.e. training with all 62 classes when the target classes
-are respectively the digits, lower-case, or upper-case
-characters). Again, whereas the gain is marginal
-or negative for the MLP, it is substantial for the SDA.
-
+SDA2), along with the previous results on the digits NIST special database
+19 test set from the literature respectively based on ARTMAP neural
+networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
+~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs
+~\citep{Milgram+al-2005}.  More detailed and complete numerical results
+(figures and tables) can be found in the appendix.  The 3 kinds of model
+differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1,
+SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the
+shallow ones and previously published performance but reaches human
+performance on both the 62-class task and the 10-class (digits) task. In
+addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts},
+the relative improvement in error rate brought by self-taught learning is
+greater for the SDA. The left side shows the improvement to the clean NIST
+test set error brought by the use of out-of-distribution examples (i.e. the
+perturbed examples examples from NISTP or P07). The right side of
+Figure~\ref{fig:fig:improvements-charts} shows the relative improvement
+brought by the use of a multi-task setting, in which the same model is
+trained for more classes than the target classes of interest (i.e. training
+with all 62 classes when the target classes are respectively the digits,
+lower-case, or upper-case characters). Again, whereas the gain is marginal
+or negative for the MLP, it is substantial for the SDA.  Note that for
+these multi-task experiment, only the original NIST dataset is used. For
+example, the MLP-digits bar shows the relative improvement in MLP error
+rate on the NIST digits test set (1 - single-task model's error /
+multi-task model's error).  The single-task model is trained with only 10
+outputs (one per digit), seeing only digit examples, whereas the multi-task
+model is trained with 62 outputs, with all 62 character classes as
+examples.  For the multi-task model, the digit error rate is measured by
+comparing the correct digit class with the output class associated with 
+the maximum conditional probability among only the digit classes outputs.
 
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\
@@ -566,11 +566,13 @@
 
 The conclusions are positive for all the questions asked in the introduction.
 %\begin{itemize}
+
 $\bullet$ %\item 
 Do the good results previously obtained with deep architectures on the
 MNIST digits generalize to the setting of a much larger and richer (but similar)
 dataset, the NIST special database 19, with 62 classes and around 800k examples?
-Yes, the SDA systematically outperformed the MLP, in fact reaching human-level
+Yes, the SDA systematically outperformed the MLP and all the previously
+published results on this dataset (as far as we know), in fact reaching human-level
 performance.
 
 $\bullet$ %\item