Mercurial > ift6266
diff writeup/nips2010_submission.tex @ 489:ee9836baade3
merge
author | dumitru@dumitru.mtv.corp.google.com |
---|---|
date | Mon, 31 May 2010 19:07:59 -0700 |
parents | 6c9ff48e15cd 21787ac4e5a0 |
children | 19eab4daf212 |
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex Mon May 31 19:07:35 2010 -0700 +++ b/writeup/nips2010_submission.tex Mon May 31 19:07:59 2010 -0700 @@ -483,36 +483,36 @@ Figure~\ref{fig:error-rates-charts} summarizes the results obtained, comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, -SDA2), along with the previous results on the digits NIST special database 19 -test set from the -literature -respectively based on ARTMAP neural networks -~\citep{Granger+al-2007}, fast nearest-neighbor search -~\citep{Cortes+al-2000}, MLPs -~\citep{Oliveira+al-2002}, and SVMs -~\citep{Milgram+al-2005}. -More detailed and complete numerical results (figures and tables) -can be found in the appendix. The 3 kinds of model differ in the -training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), -or P07 (MLP2, SDA2). The deep learner not only outperformed -the shallow ones and previously published performance -but reaches human performance on both the 62-class -task and the 10-class (digits) task. In addition, as shown -in the left of Figure~\ref{fig:fig:improvements-charts}, -the relative improvement in error rate brought by -self-taught learning is greater for the SDA. The left -side shows the improvement to the clean NIST test set error -brought by the use of out-of-distribution -examples (i.e. the perturbed examples examples from NISTP -or P07). The right side of Figure~\ref{fig:fig:improvements-charts} -shows the relative improvement brought by the use -of a multi-task setting, in which the same model is trained -for more classes than the target classes of interest -(i.e. training with all 62 classes when the target classes -are respectively the digits, lower-case, or upper-case -characters). Again, whereas the gain is marginal -or negative for the MLP, it is substantial for the SDA. - +SDA2), along with the previous results on the digits NIST special database +19 test set from the literature respectively based on ARTMAP neural +networks ~\citep{Granger+al-2007}, fast nearest-neighbor search +~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs +~\citep{Milgram+al-2005}. More detailed and complete numerical results +(figures and tables) can be found in the appendix. The 3 kinds of model +differ in the training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, +SDA1), or P07 (MLP2, SDA2). The deep learner not only outperformed the +shallow ones and previously published performance but reaches human +performance on both the 62-class task and the 10-class (digits) task. In +addition, as shown in the left of Figure~\ref{fig:fig:improvements-charts}, +the relative improvement in error rate brought by self-taught learning is +greater for the SDA. The left side shows the improvement to the clean NIST +test set error brought by the use of out-of-distribution examples (i.e. the +perturbed examples examples from NISTP or P07). The right side of +Figure~\ref{fig:fig:improvements-charts} shows the relative improvement +brought by the use of a multi-task setting, in which the same model is +trained for more classes than the target classes of interest (i.e. training +with all 62 classes when the target classes are respectively the digits, +lower-case, or upper-case characters). Again, whereas the gain is marginal +or negative for the MLP, it is substantial for the SDA. Note that for +these multi-task experiment, only the original NIST dataset is used. For +example, the MLP-digits bar shows the relative improvement in MLP error +rate on the NIST digits test set (1 - single-task model's error / +multi-task model's error). The single-task model is trained with only 10 +outputs (one per digit), seeing only digit examples, whereas the multi-task +model is trained with 62 outputs, with all 62 character classes as +examples. For the multi-task model, the digit error rate is measured by +comparing the correct digit class with the output class associated with +the maximum conditional probability among only the digit classes outputs. \begin{figure}[h] \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\ @@ -566,11 +566,13 @@ The conclusions are positive for all the questions asked in the introduction. %\begin{itemize} + $\bullet$ %\item Do the good results previously obtained with deep architectures on the MNIST digits generalize to the setting of a much larger and richer (but similar) dataset, the NIST special database 19, with 62 classes and around 800k examples? -Yes, the SDA systematically outperformed the MLP, in fact reaching human-level +Yes, the SDA systematically outperformed the MLP and all the previously +published results on this dataset (as far as we know), in fact reaching human-level performance. $\bullet$ %\item