comparison writeup/aistats2011_cameraready.tex @ 644:e63d23c7c9fb

reviews aistats finales
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Thu, 24 Mar 2011 17:05:05 -0400
parents 8b1a0b9fecff
children
comparison
equal deleted inserted replaced
643:24d9819a810f 644:e63d23c7c9fb
271 271
272 Much previous work on deep learning had been performed on 272 Much previous work on deep learning had been performed on
273 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 273 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
274 with 60,000 examples, and variants involving 10,000 274 with 60,000 examples, and variants involving 10,000
275 examples~\citep{Larochelle-jmlr-2009,VincentPLarochelleH2008-very-small}\footnote{Fortunately, there 275 examples~\citep{Larochelle-jmlr-2009,VincentPLarochelleH2008-very-small}\footnote{Fortunately, there
276 are more and more exceptions of course, such as~\citet{RainaICML09} using a million examples.} 276 are more and more exceptions of course, such as~\citet{RainaICML09-small} using a million examples.}
277 The focus here is on much larger training sets, from 10 times to 277 The focus here is on much larger training sets, from 10 times to
278 to 1000 times larger, and 62 classes. 278 to 1000 times larger, and 62 classes.
279 279
280 The first step in constructing the larger datasets (called NISTP and P07) is to sample from 280 The first step in constructing the larger datasets (called NISTP and P07) is to sample from
281 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, 281 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
326 %\begin{itemize} 326 %\begin{itemize}
327 %\item 327 %\item
328 {\bf NIST.} 328 {\bf NIST.}
329 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, 329 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
330 widely used for training and testing character 330 widely used for training and testing character
331 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. 331 recognition systems~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}.
332 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications, 332 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications,
333 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 333 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
334 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity. 334 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity.
335 The fourth partition (called $hsf_4$, 82,587 examples), 335 The fourth partition (called $hsf_4$, 82,587 examples),
336 experimentally recognized to be the most difficult one, is the one recommended 336 experimentally recognized to be the most difficult one, is the one recommended
337 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} 337 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}
338 for that purpose. We randomly split the remainder (731,668 examples) into a training set and a validation set for 338 for that purpose. We randomly split the remainder (731,668 examples) into a training set and a validation set for
339 model selection. 339 model selection.
340 The performances reported by previous work on that dataset mostly use only the digits. 340 The performances reported by previous work on that dataset mostly use only the digits.
341 Here we use all the classes both in the training and testing phase. This is especially 341 Here we use all the classes both in the training and testing phase. This is especially
342 useful to estimate the effect of a multi-task setting. 342 useful to estimate the effect of a multi-task setting.
573 %\vspace*{-3mm} 573 %\vspace*{-3mm}
574 \caption{SDAx are the {\bf deep} models. Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained 574 \caption{SDAx are the {\bf deep} models. Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained
575 on NIST, 1 on NISTP, and 2 on P07. Left: overall results 575 on NIST, 1 on NISTP, and 2 on P07. Left: overall results
576 of all models, on NIST and NISTP test sets. 576 of all models, on NIST and NISTP test sets.
577 Right: error rates on NIST test digits only, along with the previous results from 577 Right: error rates on NIST test digits only, along with the previous results from
578 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} 578 literature~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}
579 respectively based on ART, nearest neighbors, MLPs, and SVMs.} 579 respectively based on ART, nearest neighbors, MLPs, and SVMs.}
580 \label{fig:error-rates-charts} 580 \label{fig:error-rates-charts}
581 %\vspace*{-2mm} 581 %\vspace*{-2mm}
582 \end{figure*} 582 \end{figure*}
583 583
614 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, 614 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
615 comparing humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1, 615 comparing humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1,
616 SDA2), along with the previous results on the digits NIST special database 616 SDA2), along with the previous results on the digits NIST special database
617 19 test set from the literature, respectively based on ARTMAP neural 617 19 test set from the literature, respectively based on ARTMAP neural
618 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search 618 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
619 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs 619 ~\citep{Cortes+al-2000-small}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs
620 ~\citep{Milgram+al-2005}.% More detailed and complete numerical results 620 ~\citep{Milgram+al-2005}.% More detailed and complete numerical results
621 %(figures and tables, including standard errors on the error rates) can be 621 %(figures and tables, including standard errors on the error rates) can be
622 %found in Appendix. 622 %found in Appendix.
623 The deep learner not only outperformed the shallow ones and 623 The deep learner not only outperformed the shallow ones and
624 previously published performance (in a statistically and qualitatively 624 previously published performance (in a statistically and qualitatively
828 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline 828 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline
829 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline 829 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline
830 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline 830 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline
831 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline 831 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline
832 \citep{Granger+al-2007} & & & & 4.95\% $\pm$.18\% \\ \hline 832 \citep{Granger+al-2007} & & & & 4.95\% $\pm$.18\% \\ \hline
833 \citep{Cortes+al-2000} & & & & 3.71\% $\pm$.16\% \\ \hline 833 \citep{Cortes+al-2000-small} & & & & 3.71\% $\pm$.16\% \\ \hline
834 \citep{Oliveira+al-2002} & & & & 2.4\% $\pm$.13\% \\ \hline 834 \citep{Oliveira+al-2002} & & & & 2.4\% $\pm$.13\% \\ \hline
835 \citep{Milgram+al-2005} & & & & 2.1\% $\pm$.12\% \\ \hline 835 \citep{Milgram+al-2005} & & & & 2.1\% $\pm$.12\% \\ \hline
836 \end{tabular} 836 \end{tabular}
837 \end{center} 837 \end{center}
838 \end{table} 838 \end{table}