comparison writeup/nips2010_submission.tex @ 486:877af97ee193

section resultats et appendice
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Mon, 31 May 2010 22:03:35 -0400
parents 6beaf3328521
children 21787ac4e5a0
comparison
equal deleted inserted replaced
485:6beaf3328521 486:877af97ee193
481 Three users classified each image, allowing 481 Three users classified each image, allowing
482 to estimate inter-human variability (shown as +/- in parenthesis below). 482 to estimate inter-human variability (shown as +/- in parenthesis below).
483 483
484 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, 484 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
485 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, 485 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1,
486 SDA2), along with the previous results on the digits NIST special database 19 486 SDA2), along with the previous results on the digits NIST special database
487 test set from the 487 19 test set from the literature respectively based on ARTMAP neural
488 literature 488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
489 respectively based on ARTMAP neural networks 489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs
490 ~\citep{Granger+al-2007}, fast nearest-neighbor search 490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results
491 ~\citep{Cortes+al-2000}, MLPs 491 (figures and tables) can be found in the appendix. The 3 kinds of model
492 ~\citep{Oliveira+al-2002}, and SVMs 492 differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1,
493 ~\citep{Milgram+al-2005}. 493 SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the
494 More detailed and complete numerical results (figures and tables) 494 shallow ones and previously published performance but reaches human
495 can be found in the appendix. The 3 kinds of model differ in the 495 performance on both the 62-class task and the 10-class (digits) task. In
496 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), 496 addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts},
497 or P07 (MLP2, SDA2). The deep learner not only outperformed 497 the relative improvement in error rate brought by self-taught learning is
498 the shallow ones and previously published performance 498 greater for the SDA. The left side shows the improvement to the clean NIST
499 but reaches human performance on both the 62-class 499 test set error brought by the use of out-of-distribution examples (i.e. the
500 task and the 10-class (digits) task. In addition, as shown 500 perturbed examples examples from NISTP or P07). The right side of
501 in the left of Figure~\ref{fig:fig:improvements-charts}, 501 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement
502 the relative improvement in error rate brought by 502 brought by the use of a multi-task setting, in which the same model is
503 self-taught learning is greater for the SDA. The left 503 trained for more classes than the target classes of interest (i.e. training
504 side shows the improvement to the clean NIST test set error 504 with all 62 classes when the target classes are respectively the digits,
505 brought by the use of out-of-distribution 505 lower-case, or upper-case characters). Again, whereas the gain is marginal
506 examples (i.e. the perturbed examples examples from NISTP 506 or negative for the MLP, it is substantial for the SDA. Note that for
507 or P07). The right side of Figure~\ref{fig:fig:improvements-charts} 507 these multi-task experiment, only the original NIST dataset is used. For
508 shows the relative improvement brought by the use 508 example, the MLP-digits bar shows the relative improvement in MLP error
509 of a multi-task setting, in which the same model is trained 509 rate on the NIST digits test set (1 - single-task model's error /
510 for more classes than the target classes of interest 510 multi-task model's error). The single-task model is trained with only 10
511 (i.e. training with all 62 classes when the target classes 511 outputs (one per digit), seeing only digit examples, whereas the multi-task
512 are respectively the digits, lower-case, or upper-case 512 model is trained with 62 outputs, with all 62 character classes as
513 characters). Again, whereas the gain is marginal 513 examples. For the multi-task model, the digit error rate is measured by
514 or negative for the MLP, it is substantial for the SDA. 514 comparing the correct digit class with the output class associated with
515 515 the maximum conditional probability among only the digit classes outputs.
516 516
517 \begin{figure}[h] 517 \begin{figure}[h]
518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ 518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\
519 \caption{Charts corresponding to table \ref{tab:sda-vs-mlp-vs-humans}. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from litterature. } 519 \caption{Charts corresponding to table \ref{tab:sda-vs-mlp-vs-humans}. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from litterature. }
520 \label{fig:error-rates-charts} 520 \label{fig:error-rates-charts}
568 %\begin{itemize} 568 %\begin{itemize}
569 $\bullet$ %\item 569 $\bullet$ %\item
570 Do the good results previously obtained with deep architectures on the 570 Do the good results previously obtained with deep architectures on the
571 MNIST digits generalize to the setting of a much larger and richer (but similar) 571 MNIST digits generalize to the setting of a much larger and richer (but similar)
572 dataset, the NIST special database 19, with 62 classes and around 800k examples? 572 dataset, the NIST special database 19, with 62 classes and around 800k examples?
573 Yes, the SDA systematically outperformed the MLP, in fact reaching human-level 573 Yes, the SDA systematically outperformed the MLP and all the previously
574 published results on this dataset (as far as we know), in fact reaching human-level
574 performance. 575 performance.
575 576
576 $\bullet$ %\item 577 $\bullet$ %\item
577 To what extent does the perturbation of input images (e.g. adding 578 To what extent does the perturbation of input images (e.g. adding
578 noise, affine transformations, background images) make the resulting 579 noise, affine transformations, background images) make the resulting