Mercurial > ift6266
comparison writeup/aistats2011_cameraready.tex @ 644:e63d23c7c9fb
reviews aistats finales
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Thu, 24 Mar 2011 17:05:05 -0400 |
parents | 8b1a0b9fecff |
children |
comparison
equal
deleted
inserted
replaced
643:24d9819a810f | 644:e63d23c7c9fb |
---|---|
271 | 271 |
272 Much previous work on deep learning had been performed on | 272 Much previous work on deep learning had been performed on |
273 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, | 273 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, |
274 with 60,000 examples, and variants involving 10,000 | 274 with 60,000 examples, and variants involving 10,000 |
275 examples~\citep{Larochelle-jmlr-2009,VincentPLarochelleH2008-very-small}\footnote{Fortunately, there | 275 examples~\citep{Larochelle-jmlr-2009,VincentPLarochelleH2008-very-small}\footnote{Fortunately, there |
276 are more and more exceptions of course, such as~\citet{RainaICML09} using a million examples.} | 276 are more and more exceptions of course, such as~\citet{RainaICML09-small} using a million examples.} |
277 The focus here is on much larger training sets, from 10 times to | 277 The focus here is on much larger training sets, from 10 times to |
278 to 1000 times larger, and 62 classes. | 278 to 1000 times larger, and 62 classes. |
279 | 279 |
280 The first step in constructing the larger datasets (called NISTP and P07) is to sample from | 280 The first step in constructing the larger datasets (called NISTP and P07) is to sample from |
281 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, | 281 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, |
326 %\begin{itemize} | 326 %\begin{itemize} |
327 %\item | 327 %\item |
328 {\bf NIST.} | 328 {\bf NIST.} |
329 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, | 329 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, |
330 widely used for training and testing character | 330 widely used for training and testing character |
331 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. | 331 recognition systems~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}. |
332 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications, | 332 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications, |
333 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes | 333 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
334 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity. | 334 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity. |
335 The fourth partition (called $hsf_4$, 82,587 examples), | 335 The fourth partition (called $hsf_4$, 82,587 examples), |
336 experimentally recognized to be the most difficult one, is the one recommended | 336 experimentally recognized to be the most difficult one, is the one recommended |
337 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} | 337 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005} |
338 for that purpose. We randomly split the remainder (731,668 examples) into a training set and a validation set for | 338 for that purpose. We randomly split the remainder (731,668 examples) into a training set and a validation set for |
339 model selection. | 339 model selection. |
340 The performances reported by previous work on that dataset mostly use only the digits. | 340 The performances reported by previous work on that dataset mostly use only the digits. |
341 Here we use all the classes both in the training and testing phase. This is especially | 341 Here we use all the classes both in the training and testing phase. This is especially |
342 useful to estimate the effect of a multi-task setting. | 342 useful to estimate the effect of a multi-task setting. |
573 %\vspace*{-3mm} | 573 %\vspace*{-3mm} |
574 \caption{SDAx are the {\bf deep} models. Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained | 574 \caption{SDAx are the {\bf deep} models. Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained |
575 on NIST, 1 on NISTP, and 2 on P07. Left: overall results | 575 on NIST, 1 on NISTP, and 2 on P07. Left: overall results |
576 of all models, on NIST and NISTP test sets. | 576 of all models, on NIST and NISTP test sets. |
577 Right: error rates on NIST test digits only, along with the previous results from | 577 Right: error rates on NIST test digits only, along with the previous results from |
578 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} | 578 literature~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005} |
579 respectively based on ART, nearest neighbors, MLPs, and SVMs.} | 579 respectively based on ART, nearest neighbors, MLPs, and SVMs.} |
580 \label{fig:error-rates-charts} | 580 \label{fig:error-rates-charts} |
581 %\vspace*{-2mm} | 581 %\vspace*{-2mm} |
582 \end{figure*} | 582 \end{figure*} |
583 | 583 |
614 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, | 614 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, |
615 comparing humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1, | 615 comparing humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1, |
616 SDA2), along with the previous results on the digits NIST special database | 616 SDA2), along with the previous results on the digits NIST special database |
617 19 test set from the literature, respectively based on ARTMAP neural | 617 19 test set from the literature, respectively based on ARTMAP neural |
618 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search | 618 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search |
619 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs | 619 ~\citep{Cortes+al-2000-small}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs |
620 ~\citep{Milgram+al-2005}.% More detailed and complete numerical results | 620 ~\citep{Milgram+al-2005}.% More detailed and complete numerical results |
621 %(figures and tables, including standard errors on the error rates) can be | 621 %(figures and tables, including standard errors on the error rates) can be |
622 %found in Appendix. | 622 %found in Appendix. |
623 The deep learner not only outperformed the shallow ones and | 623 The deep learner not only outperformed the shallow ones and |
624 previously published performance (in a statistically and qualitatively | 624 previously published performance (in a statistically and qualitatively |
828 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline | 828 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline |
829 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline | 829 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline |
830 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline | 830 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline |
831 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline | 831 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline |
832 \citep{Granger+al-2007} & & & & 4.95\% $\pm$.18\% \\ \hline | 832 \citep{Granger+al-2007} & & & & 4.95\% $\pm$.18\% \\ \hline |
833 \citep{Cortes+al-2000} & & & & 3.71\% $\pm$.16\% \\ \hline | 833 \citep{Cortes+al-2000-small} & & & & 3.71\% $\pm$.16\% \\ \hline |
834 \citep{Oliveira+al-2002} & & & & 2.4\% $\pm$.13\% \\ \hline | 834 \citep{Oliveira+al-2002} & & & & 2.4\% $\pm$.13\% \\ \hline |
835 \citep{Milgram+al-2005} & & & & 2.1\% $\pm$.12\% \\ \hline | 835 \citep{Milgram+al-2005} & & & & 2.1\% $\pm$.12\% \\ \hline |
836 \end{tabular} | 836 \end{tabular} |
837 \end{center} | 837 \end{center} |
838 \end{table} | 838 \end{table} |