Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 489:ee9836baade3
merge
author | dumitru@dumitru.mtv.corp.google.com |
---|---|
date | Mon, 31 May 2010 19:07:59 -0700 |
parents | 6c9ff48e15cd 21787ac4e5a0 |
children | 19eab4daf212 |
comparison
equal
deleted
inserted
replaced
488:6c9ff48e15cd | 489:ee9836baade3 |
---|---|
481 Three users classified each image, allowing | 481 Three users classified each image, allowing |
482 to estimate inter-human variability (shown as +/- in parenthesis below). | 482 to estimate inter-human variability (shown as +/- in parenthesis below). |
483 | 483 |
484 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, | 484 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, |
485 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, | 485 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, |
486 SDA2), along with the previous results on the digits NIST special database 19 | 486 SDA2), along with the previous results on the digits NIST special database |
487 test set from the | 487 19 test set from the literature respectively based on ARTMAP neural |
488 literature | 488 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search |
489 respectively based on ARTMAP neural networks | 489 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs |
490 ~\citep{Granger+al-2007}, fast nearest-neighbor search | 490 ~\citep{Milgram+al-2005}. More detailed and complete numerical results |
491 ~\citep{Cortes+al-2000}, MLPs | 491 (figures and tables) can be found in the appendix. The 3 kinds of model |
492 ~\citep{Oliveira+al-2002}, and SVMs | 492 differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, |
493 ~\citep{Milgram+al-2005}. | 493 SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the |
494 More detailed and complete numerical results (figures and tables) | 494 shallow ones and previously published performance but reaches human |
495 can be found in the appendix. The 3 kinds of model differ in the | 495 performance on both the 62-class task and the 10-class (digits) task. In |
496 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), | 496 addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts}, |
497 or P07 (MLP2, SDA2). The deep learner not only outperformed | 497 the relative improvement in error rate brought by self-taught learning is |
498 the shallow ones and previously published performance | 498 greater for the SDA. The left side shows the improvement to the clean NIST |
499 but reaches human performance on both the 62-class | 499 test set error brought by the use of out-of-distribution examples (i.e. the |
500 task and the 10-class (digits) task. In addition, as shown | 500 perturbed examples examples from NISTP or P07). The right side of |
501 in the left of Figure~\ref{fig:fig:improvements-charts}, | 501 Figure~\ref{fig:fig:improvements-charts} shows the relative improvement |
502 the relative improvement in error rate brought by | 502 brought by the use of a multi-task setting, in which the same model is |
503 self-taught learning is greater for the SDA. The left | 503 trained for more classes than the target classes of interest (i.e. training |
504 side shows the improvement to the clean NIST test set error | 504 with all 62 classes when the target classes are respectively the digits, |
505 brought by the use of out-of-distribution | 505 lower-case, or upper-case characters). Again, whereas the gain is marginal |
506 examples (i.e. the perturbed examples examples from NISTP | 506 or negative for the MLP, it is substantial for the SDA. Note that for |
507 or P07). The right side of Figure~\ref{fig:fig:improvements-charts} | 507 these multi-task experiment, only the original NIST dataset is used. For |
508 shows the relative improvement brought by the use | 508 example, the MLP-digits bar shows the relative improvement in MLP error |
509 of a multi-task setting, in which the same model is trained | 509 rate on the NIST digits test set (1 - single-task model's error / |
510 for more classes than the target classes of interest | 510 multi-task model's error). The single-task model is trained with only 10 |
511 (i.e. training with all 62 classes when the target classes | 511 outputs (one per digit), seeing only digit examples, whereas the multi-task |
512 are respectively the digits, lower-case, or upper-case | 512 model is trained with 62 outputs, with all 62 character classes as |
513 characters). Again, whereas the gain is marginal | 513 examples. For the multi-task model, the digit error rate is measured by |
514 or negative for the MLP, it is substantial for the SDA. | 514 comparing the correct digit class with the output class associated with |
515 | 515 the maximum conditional probability among only the digit classes outputs. |
516 | 516 |
517 \begin{figure}[h] | 517 \begin{figure}[h] |
518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ | 518 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ |
519 \caption{Charts corresponding to table \ref{tab:sda-vs-mlp-vs-humans}. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from litterature. } | 519 \caption{Charts corresponding to table \ref{tab:sda-vs-mlp-vs-humans}. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from litterature. } |
520 \label{fig:error-rates-charts} | 520 \label{fig:error-rates-charts} |
564 \section{Conclusions} | 564 \section{Conclusions} |
565 \vspace*{-1mm} | 565 \vspace*{-1mm} |
566 | 566 |
567 The conclusions are positive for all the questions asked in the introduction. | 567 The conclusions are positive for all the questions asked in the introduction. |
568 %\begin{itemize} | 568 %\begin{itemize} |
569 | |
569 $\bullet$ %\item | 570 $\bullet$ %\item |
570 Do the good results previously obtained with deep architectures on the | 571 Do the good results previously obtained with deep architectures on the |
571 MNIST digits generalize to the setting of a much larger and richer (but similar) | 572 MNIST digits generalize to the setting of a much larger and richer (but similar) |
572 dataset, the NIST special database 19, with 62 classes and around 800k examples? | 573 dataset, the NIST special database 19, with 62 classes and around 800k examples? |
573 Yes, the SDA systematically outperformed the MLP, in fact reaching human-level | 574 Yes, the SDA systematically outperformed the MLP and all the previously |
575 published results on this dataset (as far as we know), in fact reaching human-level | |
574 performance. | 576 performance. |
575 | 577 |
576 $\bullet$ %\item | 578 $\bullet$ %\item |
577 To what extent does the perturbation of input images (e.g. adding | 579 To what extent does the perturbation of input images (e.g. adding |
578 noise, affine transformations, background images) make the resulting | 580 noise, affine transformations, background images) make the resulting |