ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 486:877af97ee193

section resultats et appendice

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Mon, 31 May 2010 22:03:35 -0400
parents	6beaf3328521
children	21787ac4e5a0

comparison

equal deleted inserted replaced

-:6beaf3328521
+:877af97ee193
 Three users classified each image, allowing
 to estimate inter-human variability (shown as +/- in parenthesis below).
 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1,
-SDA2), along with the previous results on the digits NIST special database 19
+SDA2), along with the previous results on the digits NIST special database
-test set from the
+19 test set from the literature respectively based on ARTMAP neural
-literature
+networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
-respectively based on ARTMAP neural networks
+~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs
-~\citep{Granger+al-2007}, fast nearest-neighbor search
+~\citep{Milgram+al-2005}.  More detailed and complete numerical results
-~\citep{Cortes+al-2000}, MLPs
+(figures and tables) can be found in the appendix.  The 3 kinds of model
-~\citep{Oliveira+al-2002}, and SVMs
+differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1,
-~\citep{Milgram+al-2005}.
+SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the
-More detailed and complete numerical results (figures and tables)
+shallow ones and previously published performance but reaches human
-can be found in the appendix.  The 3 kinds of model differ in the
+performance on both the 62-class task and the 10-class (digits) task. In
-training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1),
+addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts},
-or P07 (MLP2, SDA2). The deep learner not only outperformed
+the relative improvement in error rate brought by self-taught learning is
-the shallow ones and previously published performance
+greater for the SDA. The left side shows the improvement to the clean NIST
-but reaches human performance on both the 62-class
+test set error brought by the use of out-of-distribution examples (i.e. the
-task and the 10-class (digits) task. In addition, as shown
+perturbed examples examples from NISTP or P07). The right side of
-in the left of Figure~\ref{fig:fig:improvements-charts},
+Figure~\ref{fig:fig:improvements-charts} shows the relative improvement
-the relative improvement in error rate brought by
+brought by the use of a multi-task setting, in which the same model is
-self-taught learning is greater for the SDA. The left
+trained for more classes than the target classes of interest (i.e. training
-side shows the improvement to the clean NIST test set error
+with all 62 classes when the target classes are respectively the digits,
-brought by the use of out-of-distribution
+lower-case, or upper-case characters). Again, whereas the gain is marginal
-examples (i.e. the perturbed examples examples from NISTP
+or negative for the MLP, it is substantial for the SDA.  Note that for
-or P07). The right side of Figure~\ref{fig:fig:improvements-charts}
+these multi-task experiment, only the original NIST dataset is used. For
-shows the relative improvement brought by the use
+example, the MLP-digits bar shows the relative improvement in MLP error
-of a multi-task setting, in which the same model is trained
+rate on the NIST digits test set (1 - single-task model's error /
-for more classes than the target classes of interest
+multi-task model's error).  The single-task model is trained with only 10
-(i.e. training with all 62 classes when the target classes
+outputs (one per digit), seeing only digit examples, whereas the multi-task
-are respectively the digits, lower-case, or upper-case
+model is trained with 62 outputs, with all 62 character classes as
-characters). Again, whereas the gain is marginal
+examples.  For the multi-task model, the digit error rate is measured by
-or negative for the MLP, it is substantial for the SDA.
+comparing the correct digit class with the output class associated with
+the maximum conditional probability among only the digit classes outputs.
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\
 \caption{Charts corresponding to table \ref{tab:sda-vs-mlp-vs-humans}. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from litterature. }
 \label{fig:error-rates-charts}
 %\begin{itemize}
 $\bullet$ %\item
 Do the good results previously obtained with deep architectures on the
 MNIST digits generalize to the setting of a much larger and richer (but similar)
 dataset, the NIST special database 19, with 62 classes and around 800k examples?
-Yes, the SDA systematically outperformed the MLP, in fact reaching human-level
+Yes, the SDA systematically outperformed the MLP and all the previously
+published results on this dataset (as far as we know), in fact reaching human-level
 performance.
 $\bullet$ %\item
 To what extent does the perturbation of input images (e.g. adding
 noise, affine transformations, background images) make the resulting

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 486:877af97ee193