ift6266: writeup/aistats2011_cameraready.tex comparison

comparison writeup/aistats2011_cameraready.tex @ 644:e63d23c7c9fb

reviews aistats finales

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Thu, 24 Mar 2011 17:05:05 -0400
parents	8b1a0b9fecff
children

comparison

equal deleted inserted replaced

-:24d9819a810f
+:e63d23c7c9fb
 Much previous work on deep learning had been performed on
 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
 with 60,000 examples, and variants involving 10,000
 examples~\citep{Larochelle-jmlr-2009,VincentPLarochelleH2008-very-small}\footnote{Fortunately, there
-are more and more exceptions of course, such as~\citet{RainaICML09} using a million examples.}
+are more and more exceptions of course, such as~\citet{RainaICML09-small} using a million examples.}
 The focus here is on much larger training sets, from 10 times to
 to 1000 times larger, and 62 classes.
 The first step in constructing the larger datasets (called NISTP and P07) is to sample from
 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
 %\begin{itemize}
 %\item
 {\bf NIST.}
 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
 widely used for training and testing character
-recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}.
+recognition systems~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}.
 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications,
 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity.
 The fourth partition (called $hsf_4$, 82,587 examples),
 experimentally recognized to be the most difficult one, is the one recommended
-by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
+by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}
 for that purpose. We randomly split the remainder (731,668 examples) into a training set and a validation set for
 model selection.
 The performances reported by previous work on that dataset mostly use only the digits.
 Here we use all the classes both in the training and testing phase. This is especially
 useful to estimate the effect of a multi-task setting.
 %\vspace*{-3mm}
 \caption{SDAx are the {\bf deep} models. Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained
 on NIST, 1 on NISTP, and 2 on P07. Left: overall results
 of all models, on NIST and NISTP test sets.
 Right: error rates on NIST test digits only, along with the previous results from
-literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
+literature~\citep{Granger+al-2007,Cortes+al-2000-small,Oliveira+al-2002-short,Milgram+al-2005}
 respectively based on ART, nearest neighbors, MLPs, and SVMs.}
 \label{fig:error-rates-charts}
 %\vspace*{-2mm}
 \end{figure*}
 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
 comparing humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1,
 SDA2), along with the previous results on the digits NIST special database
 19 test set from the literature, respectively based on ARTMAP neural
 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
-~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs
+~\citep{Cortes+al-2000-small}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs
 ~\citep{Milgram+al-2005}.%  More detailed and complete numerical results
 %(figures and tables, including standard errors on the error rates) can be
 %found in Appendix.
 The deep learner not only outperformed the shallow ones and
 previously published performance (in a statistically and qualitatively
 SDA2   &  18.7\% $\pm$.13\%  &  33.6\%$\pm$.3\%  & 39.9\%$\pm$.17\%  & 1.7\% $\pm$.1\%\\ \hline
 MLP0   &  24.2\% $\pm$.15\%  & 68.8\%$\pm$.33\%  & 78.70\%$\pm$.14\%  & 3.45\% $\pm$.15\% \\ \hline
 MLP1   &  23.0\% $\pm$.15\%  &  41.8\%$\pm$.35\%  & 90.4\%$\pm$.1\%  & 3.85\% $\pm$.16\% \\ \hline
 MLP2   &  24.3\% $\pm$.15\%  &  46.0\%$\pm$.35\%  & 54.7\%$\pm$.17\%  & 4.85\% $\pm$.18\% \\ \hline
 \citep{Granger+al-2007} &     &                    &                   & 4.95\% $\pm$.18\% \\ \hline
-\citep{Cortes+al-2000} &      &                    &                   & 3.71\% $\pm$.16\% \\ \hline
+\citep{Cortes+al-2000-small} &      &                    &                   & 3.71\% $\pm$.16\% \\ \hline
 \citep{Oliveira+al-2002} &    &                    &                   & 2.4\% $\pm$.13\% \\ \hline
 \citep{Milgram+al-2005} &      &                    &                   & 2.1\% $\pm$.12\% \\ \hline
 \end{tabular}
 \end{center}
 \end{table}

Mercurial > ift6266

comparison writeup/aistats2011_cameraready.tex @ 644:e63d23c7c9fb