comparison writeup/nips2010_submission.tex @ 516:092dae9a5040

make the reference more compact.
author Frederic Bastien <nouiz@nouiz.org>
date Tue, 01 Jun 2010 14:08:44 -0400
parents 920a38715c90
children 460a4e78c9a4
comparison
equal deleted inserted replaced
515:4a94be41b550 516:092dae9a5040
321 \vspace*{-1mm} 321 \vspace*{-1mm}
322 \section{Experimental Setup} 322 \section{Experimental Setup}
323 \vspace*{-1mm} 323 \vspace*{-1mm}
324 324
325 Whereas much previous work on deep learning algorithms had been performed on 325 Whereas much previous work on deep learning algorithms had been performed on
326 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 326 the MNIST digits classification task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
327 with 60~000 examples, and variants involving 10~000 327 with 60~000 examples, and variants involving 10~000
328 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want 328 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
329 to focus here on the case of much larger training sets, from 10 times to 329 to focus here on the case of much larger training sets, from 10 times to
330 to 1000 times larger. The larger datasets are obtained by first sampling from 330 to 1000 times larger. The larger datasets are obtained by first sampling from
331 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, 331 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
357 %\begin{itemize} 357 %\begin{itemize}
358 %\item 358 %\item
359 {\bf NIST.} 359 {\bf NIST.}
360 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, 360 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
361 widely used for training and testing character 361 widely used for training and testing character
362 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}. 362 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}.
363 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, 363 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications,
364 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 364 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
365 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. 365 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
366 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended 366 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended
367 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} 367 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
368 for that purpose. We randomly split the remainder into a training set and a validation set for 368 for that purpose. We randomly split the remainder into a training set and a validation set for
369 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, 369 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
370 and 82587 for testing. 370 and 82587 for testing.
371 The performances reported by previous work on that dataset mostly use only the digits. 371 The performances reported by previous work on that dataset mostly use only the digits.
372 Here we use all the classes both in the training and testing phase. This is especially 372 Here we use all the classes both in the training and testing phase. This is especially
451 through preliminary experiments, and 0.1 was selected. 451 through preliminary experiments, and 0.1 was selected.
452 452
453 {\bf Stacked Denoising Auto-Encoders (SDA).} 453 {\bf Stacked Denoising Auto-Encoders (SDA).}
454 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) 454 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs)
455 can be used to initialize the weights of each layer of a deep MLP (with many hidden 455 can be used to initialize the weights of each layer of a deep MLP (with many hidden
456 layers)~\citep{Hinton06,ranzato-07,Bengio-nips-2006} 456 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006}
457 enabling better generalization, apparently setting parameters in the 457 enabling better generalization, apparently setting parameters in the
458 basin of attraction of supervised gradient descent yielding better 458 basin of attraction of supervised gradient descent yielding better
459 generalization~\citep{Erhan+al-2010}. It is hypothesized that the 459 generalization~\citep{Erhan+al-2010}. It is hypothesized that the
460 advantage brought by this procedure stems from a better prior, 460 advantage brought by this procedure stems from a better prior,
461 on the one hand taking advantage of the link between the input 461 on the one hand taking advantage of the link between the input
499 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, 499 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
500 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, 500 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1,
501 SDA2), along with the previous results on the digits NIST special database 501 SDA2), along with the previous results on the digits NIST special database
502 19 test set from the literature respectively based on ARTMAP neural 502 19 test set from the literature respectively based on ARTMAP neural
503 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search 503 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
504 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs 504 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs
505 ~\citep{Milgram+al-2005}. More detailed and complete numerical results 505 ~\citep{Milgram+al-2005}. More detailed and complete numerical results
506 (figures and tables, including standard errors on the error rates) can be 506 (figures and tables, including standard errors on the error rates) can be
507 found in the supplementary material. The 3 kinds of model differ in the 507 found in the supplementary material. The 3 kinds of model differ in the
508 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 508 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07
509 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and 509 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and
544 \caption{Error bars indicate a 95\% confidence interval. 0 indicates training 544 \caption{Error bars indicate a 95\% confidence interval. 0 indicates training
545 on NIST, 1 on NISTP, and 2 on P07. Left: overall results 545 on NIST, 1 on NISTP, and 2 on P07. Left: overall results
546 of all models, on 3 different test sets corresponding to the three 546 of all models, on 3 different test sets corresponding to the three
547 datasets. 547 datasets.
548 Right: error rates on NIST test digits only, along with the previous results from 548 Right: error rates on NIST test digits only, along with the previous results from
549 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} 549 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
550 respectively based on ART, nearest neighbors, MLPs, and SVMs.} 550 respectively based on ART, nearest neighbors, MLPs, and SVMs.}
551 551
552 \label{fig:error-rates-charts} 552 \label{fig:error-rates-charts}
553 \end{figure} 553 \end{figure}
554 554