Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 516:092dae9a5040
make the reference more compact.
author | Frederic Bastien <nouiz@nouiz.org> |
---|---|
date | Tue, 01 Jun 2010 14:08:44 -0400 |
parents | 920a38715c90 |
children | 460a4e78c9a4 |
comparison
equal
deleted
inserted
replaced
515:4a94be41b550 | 516:092dae9a5040 |
---|---|
321 \vspace*{-1mm} | 321 \vspace*{-1mm} |
322 \section{Experimental Setup} | 322 \section{Experimental Setup} |
323 \vspace*{-1mm} | 323 \vspace*{-1mm} |
324 | 324 |
325 Whereas much previous work on deep learning algorithms had been performed on | 325 Whereas much previous work on deep learning algorithms had been performed on |
326 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, | 326 the MNIST digits classification task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, |
327 with 60~000 examples, and variants involving 10~000 | 327 with 60~000 examples, and variants involving 10~000 |
328 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want | 328 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want |
329 to focus here on the case of much larger training sets, from 10 times to | 329 to focus here on the case of much larger training sets, from 10 times to |
330 to 1000 times larger. The larger datasets are obtained by first sampling from | 330 to 1000 times larger. The larger datasets are obtained by first sampling from |
331 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, | 331 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, |
357 %\begin{itemize} | 357 %\begin{itemize} |
358 %\item | 358 %\item |
359 {\bf NIST.} | 359 {\bf NIST.} |
360 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, | 360 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, |
361 widely used for training and testing character | 361 widely used for training and testing character |
362 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}. | 362 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. |
363 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, | 363 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, |
364 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes | 364 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
365 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. | 365 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. |
366 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended | 366 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended |
367 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} | 367 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} |
368 for that purpose. We randomly split the remainder into a training set and a validation set for | 368 for that purpose. We randomly split the remainder into a training set and a validation set for |
369 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, | 369 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, |
370 and 82587 for testing. | 370 and 82587 for testing. |
371 The performances reported by previous work on that dataset mostly use only the digits. | 371 The performances reported by previous work on that dataset mostly use only the digits. |
372 Here we use all the classes both in the training and testing phase. This is especially | 372 Here we use all the classes both in the training and testing phase. This is especially |
451 through preliminary experiments, and 0.1 was selected. | 451 through preliminary experiments, and 0.1 was selected. |
452 | 452 |
453 {\bf Stacked Denoising Auto-Encoders (SDA).} | 453 {\bf Stacked Denoising Auto-Encoders (SDA).} |
454 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) | 454 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) |
455 can be used to initialize the weights of each layer of a deep MLP (with many hidden | 455 can be used to initialize the weights of each layer of a deep MLP (with many hidden |
456 layers)~\citep{Hinton06,ranzato-07,Bengio-nips-2006} | 456 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006} |
457 enabling better generalization, apparently setting parameters in the | 457 enabling better generalization, apparently setting parameters in the |
458 basin of attraction of supervised gradient descent yielding better | 458 basin of attraction of supervised gradient descent yielding better |
459 generalization~\citep{Erhan+al-2010}. It is hypothesized that the | 459 generalization~\citep{Erhan+al-2010}. It is hypothesized that the |
460 advantage brought by this procedure stems from a better prior, | 460 advantage brought by this procedure stems from a better prior, |
461 on the one hand taking advantage of the link between the input | 461 on the one hand taking advantage of the link between the input |
499 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, | 499 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, |
500 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, | 500 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, |
501 SDA2), along with the previous results on the digits NIST special database | 501 SDA2), along with the previous results on the digits NIST special database |
502 19 test set from the literature respectively based on ARTMAP neural | 502 19 test set from the literature respectively based on ARTMAP neural |
503 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search | 503 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search |
504 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002}, and SVMs | 504 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs |
505 ~\citep{Milgram+al-2005}. More detailed and complete numerical results | 505 ~\citep{Milgram+al-2005}. More detailed and complete numerical results |
506 (figures and tables, including standard errors on the error rates) can be | 506 (figures and tables, including standard errors on the error rates) can be |
507 found in the supplementary material. The 3 kinds of model differ in the | 507 found in the supplementary material. The 3 kinds of model differ in the |
508 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 | 508 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 |
509 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and | 509 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and |
544 \caption{Error bars indicate a 95\% confidence interval. 0 indicates training | 544 \caption{Error bars indicate a 95\% confidence interval. 0 indicates training |
545 on NIST, 1 on NISTP, and 2 on P07. Left: overall results | 545 on NIST, 1 on NISTP, and 2 on P07. Left: overall results |
546 of all models, on 3 different test sets corresponding to the three | 546 of all models, on 3 different test sets corresponding to the three |
547 datasets. | 547 datasets. |
548 Right: error rates on NIST test digits only, along with the previous results from | 548 Right: error rates on NIST test digits only, along with the previous results from |
549 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} | 549 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} |
550 respectively based on ART, nearest neighbors, MLPs, and SVMs.} | 550 respectively based on ART, nearest neighbors, MLPs, and SVMs.} |
551 | 551 |
552 \label{fig:error-rates-charts} | 552 \label{fig:error-rates-charts} |
553 \end{figure} | 553 \end{figure} |
554 | 554 |