comparison writeup/nips2010_submission.tex @ 493:a194ce5a4249

difference stat. sign.
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 07:55:38 -0400
parents 19eab4daf212
children 5764a2ae1fb5
comparison
equal deleted inserted replaced
491:19eab4daf212 493:a194ce5a4249
486 SDA2), along with the previous results on the digits NIST special database 486 SDA2), along with the previous results on the digits NIST special database
487 19 test set from the literature respectively based on ARTMAP neural 487 19 test set from the literature respectively based on ARTMAP neural
488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search 488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs 489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs
490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results 490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results
491 (figures and tables) can be found in the appendix. The 3 kinds of model 491 (figures and tables, including standard errors on the error rates) can be
492 differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, 492 found in the supplementary material. The 3 kinds of model differ in the
493 SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the 493 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07
494 shallow ones and previously published performance but reaches human 494 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and
495 performance on both the 62-class task and the 10-class (digits) task. In 495 previously published performance (in a statistically and qualitatively
496 addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts}, 496 significant way) but reaches human performance on both the 62-class task
497 the relative improvement in error rate brought by self-taught learning is 497 and the 10-class (digits) task. In addition, as shown in the left of
498 greater for the SDA. The left side shows the improvement to the clean NIST 498 Figure~\ref{fig:fig:improvements-charts}, the relative improvement in error
499 test set error brought by the use of out-of-distribution examples (i.e. the 499 rate brought by self-taught learning is greater for the SDA, and these
500 perturbed examples examples from NISTP or P07). The right side of 500 differences with the MLP are statistically and qualitatively
501 significant. The left side of the figure shows the improvement to the clean
502 NIST test set error brought by the use of out-of-distribution examples
503 (i.e. the perturbed examples examples from NISTP or P07). The right side of
501 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement 504 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement
502 brought by the use of a multi-task setting, in which the same model is 505 brought by the use of a multi-task setting, in which the same model is
503 trained for more classes than the target classes of interest (i.e. training 506 trained for more classes than the target classes of interest (i.e. training
504 with all 62 classes when the target classes are respectively the digits, 507 with all 62 classes when the target classes are respectively the digits,
505 lower-case, or upper-case characters). Again, whereas the gain is marginal 508 lower-case, or upper-case characters). Again, whereas the gain from the
506 or negative for the MLP, it is substantial for the SDA. Note that for 509 multi-task setting is marginal or negative for the MLP, it is substantial
507 these multi-task experiment, only the original NIST dataset is used. For 510 for the SDA. Note that for these multi-task experiment, only the original
508 example, the MLP-digits bar shows the relative improvement in MLP error 511 NIST dataset is used. For example, the MLP-digits bar shows the relative
509 rate on the NIST digits test set (1 - single-task model's error / 512 improvement in MLP error rate on the NIST digits test set (1 - single-task
510 multi-task model's error). The single-task model is trained with only 10 513 model's error / multi-task model's error). The single-task model is
511 outputs (one per digit), seeing only digit examples, whereas the multi-task 514 trained with only 10 outputs (one per digit), seeing only digit examples,
512 model is trained with 62 outputs, with all 62 character classes as 515 whereas the multi-task model is trained with 62 outputs, with all 62
513 examples. For the multi-task model, the digit error rate is measured by 516 character classes as examples. Hence the hidden units are shared across
514 comparing the correct digit class with the output class associated with 517 all tasks. For the multi-task model, the digit error rate is measured by
515 the maximum conditional probability among only the digit classes outputs. 518 comparing the correct digit class with the output class associated with the
519 maximum conditional probability among only the digit classes outputs. The
520 setting is similar for the other two target classes (lower case characters
521 and upper case characters).
516 522
517 \begin{figure}[h] 523 \begin{figure}[h]
518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ 524 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\
519 \caption{Left: overall results; error bars indicate a 95\% confidence interval. 525 \caption{Left: overall results; error bars indicate a 95\% confidence interval.
520 Right: error rates on NIST test digits only, with results from literature. } 526 Right: error rates on NIST test digits only, with results from literature. }