Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 493:a194ce5a4249
difference stat. sign.
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 01 Jun 2010 07:55:38 -0400 |
parents | 19eab4daf212 |
children | 5764a2ae1fb5 |
comparison
equal
deleted
inserted
replaced
491:19eab4daf212 | 493:a194ce5a4249 |
---|---|
486 SDA2), along with the previous results on the digits NIST special database | 486 SDA2), along with the previous results on the digits NIST special database |
487 19 test set from the literature respectively based on ARTMAP neural | 487 19 test set from the literature respectively based on ARTMAP neural |
488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search | 488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search |
489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs | 489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs |
490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results | 490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results |
491 (figures and tables) can be found in the appendix. The 3 kinds of model | 491 (figures and tables, including standard errors on the error rates) can be |
492 differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, | 492 found in the supplementary material. The 3 kinds of model differ in the |
493 SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the | 493 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 |
494 shallow ones and previously published performance but reaches human | 494 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and |
495 performance on both the 62-class task and the 10-class (digits) task. In | 495 previously published performance (in a statistically and qualitatively |
496 addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts}, | 496 significant way) but reaches human performance on both the 62-class task |
497 the relative improvement in error rate brought by self-taught learning is | 497 and the 10-class (digits) task. In addition, as shown in the left of |
498 greater for the SDA. The left side shows the improvement to the clean NIST | 498 Figure~\ref{fig:fig:improvements-charts}, the relative improvement in error |
499 test set error brought by the use of out-of-distribution examples (i.e. the | 499 rate brought by self-taught learning is greater for the SDA, and these |
500 perturbed examples examples from NISTP or P07). The right side of | 500 differences with the MLP are statistically and qualitatively |
501 significant. The left side of the figure shows the improvement to the clean | |
502 NIST test set error brought by the use of out-of-distribution examples | |
503 (i.e. the perturbed examples examples from NISTP or P07). The right side of | |
501 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement | 504 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement |
502 brought by the use of a multi-task setting, in which the same model is | 505 brought by the use of a multi-task setting, in which the same model is |
503 trained for more classes than the target classes of interest (i.e. training | 506 trained for more classes than the target classes of interest (i.e. training |
504 with all 62 classes when the target classes are respectively the digits, | 507 with all 62 classes when the target classes are respectively the digits, |
505 lower-case, or upper-case characters). Again, whereas the gain is marginal | 508 lower-case, or upper-case characters). Again, whereas the gain from the |
506 or negative for the MLP, it is substantial for the SDA. Note that for | 509 multi-task setting is marginal or negative for the MLP, it is substantial |
507 these multi-task experiment, only the original NIST dataset is used. For | 510 for the SDA. Note that for these multi-task experiment, only the original |
508 example, the MLP-digits bar shows the relative improvement in MLP error | 511 NIST dataset is used. For example, the MLP-digits bar shows the relative |
509 rate on the NIST digits test set (1 - single-task model's error / | 512 improvement in MLP error rate on the NIST digits test set (1 - single-task |
510 multi-task model's error). The single-task model is trained with only 10 | 513 model's error / multi-task model's error). The single-task model is |
511 outputs (one per digit), seeing only digit examples, whereas the multi-task | 514 trained with only 10 outputs (one per digit), seeing only digit examples, |
512 model is trained with 62 outputs, with all 62 character classes as | 515 whereas the multi-task model is trained with 62 outputs, with all 62 |
513 examples. For the multi-task model, the digit error rate is measured by | 516 character classes as examples. Hence the hidden units are shared across |
514 comparing the correct digit class with the output class associated with | 517 all tasks. For the multi-task model, the digit error rate is measured by |
515 the maximum conditional probability among only the digit classes outputs. | 518 comparing the correct digit class with the output class associated with the |
519 maximum conditional probability among only the digit classes outputs. The | |
520 setting is similar for the other two target classes (lower case characters | |
521 and upper case characters). | |
516 | 522 |
517 \begin{figure}[h] | 523 \begin{figure}[h] |
518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ | 524 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ |
519 \caption{Left: overall results; error bars indicate a 95\% confidence interval. | 525 \caption{Left: overall results; error bars indicate a 95\% confidence interval. |
520 Right: error rates on NIST test digits only, with results from literature. } | 526 Right: error rates on NIST test digits only, with results from literature. } |