Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 539:84f42fe05594
merge
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Tue, 01 Jun 2010 19:34:22 -0700 |
parents | f0ee2212ea7c caf7769ca19c |
children | 8aad1c6ec39a |
comparison
equal
deleted
inserted
replaced
538:f0ee2212ea7c | 539:84f42fe05594 |
---|---|
355 with 60~000 examples, and variants involving 10~000 | 355 with 60~000 examples, and variants involving 10~000 |
356 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want | 356 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want |
357 to focus here on the case of much larger training sets, from 10 times to | 357 to focus here on the case of much larger training sets, from 10 times to |
358 to 1000 times larger. | 358 to 1000 times larger. |
359 | 359 |
360 The first step in constructing the larger datasets is to sample from | 360 The first step in constructing the larger datasets (called NISTP and P07) is to sample from |
361 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, | 361 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, |
362 and {\bf OCR data} (scanned machine printed characters). Once a character | 362 and {\bf OCR data} (scanned machine printed characters). Once a character |
363 is sampled from one of these sources (chosen randomly), the pipeline of | 363 is sampled from one of these sources (chosen randomly), the pipeline of |
364 the transformations and/or noise processes described in section \ref{s:perturbations} | 364 the transformations and/or noise processes described in section \ref{s:perturbations} |
365 is applied to the image. | 365 is applied to the image. |
366 | 366 |
367 We compare the best MLP against | 367 We compare the best MLPs against |
368 the best SDA (both models' hyper-parameters are selected to minimize the validation set error), | 368 the best SDAs (both models' hyper-parameters are selected to minimize the validation set error), |
369 along with a comparison against a precise estimate | 369 along with a comparison against a precise estimate |
370 of human performance obtained via Amazon's Mechanical Turk (AMT) | 370 of human performance obtained via Amazon's Mechanical Turk (AMT) |
371 service (http://mturk.com). | 371 service (http://mturk.com). |
372 AMT users are paid small amounts | 372 AMT users are paid small amounts |
373 of money to perform tasks for which human intelligence is required. | 373 of money to perform tasks for which human intelligence is required. |
523 comparable or better than RBMs in series of experiments | 523 comparable or better than RBMs in series of experiments |
524 \citep{VincentPLarochelleH2008}. During training, a Denoising | 524 \citep{VincentPLarochelleH2008}. During training, a Denoising |
525 Auto-Encoder is presented with a stochastically corrupted version | 525 Auto-Encoder is presented with a stochastically corrupted version |
526 of the input and trained to reconstruct the uncorrupted input, | 526 of the input and trained to reconstruct the uncorrupted input, |
527 forcing the hidden units to represent the leading regularities in | 527 forcing the hidden units to represent the leading regularities in |
528 the data. Once it is trained, its hidden units activations can | 528 the data. Once it is trained, in a purely unsupervised way, |
529 its hidden units activations can | |
529 be used as inputs for training a second one, etc. | 530 be used as inputs for training a second one, etc. |
530 After this unsupervised pre-training stage, the parameters | 531 After this unsupervised pre-training stage, the parameters |
531 are used to initialize a deep MLP, which is fine-tuned by | 532 are used to initialize a deep MLP, which is fine-tuned by |
532 the same standard procedure used to train them (see previous section). | 533 the same standard procedure used to train them (see previous section). |
533 The SDA hyper-parameters are the same as for the MLP, with the addition of the | 534 The SDA hyper-parameters are the same as for the MLP, with the addition of the |
558 \section{Experimental Results} | 559 \section{Experimental Results} |
559 | 560 |
560 %\vspace*{-1mm} | 561 %\vspace*{-1mm} |
561 %\subsection{SDA vs MLP vs Humans} | 562 %\subsection{SDA vs MLP vs Humans} |
562 %\vspace*{-1mm} | 563 %\vspace*{-1mm} |
563 | 564 The models are either trained on NIST (MLP0 and SDA0), |
565 NISTP (MLP1 and SDA1), or P07 (MLP2 and SDA2), and tested | |
566 on either NIST, NISTP or P07, either on all 62 classes | |
567 or only on the digits (considering only the outputs | |
568 associated with digit classes). | |
564 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, | 569 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, |
565 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, | 570 comparing Humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1, |
566 SDA2), along with the previous results on the digits NIST special database | 571 SDA2), along with the previous results on the digits NIST special database |
567 19 test set from the literature respectively based on ARTMAP neural | 572 19 test set from the literature respectively based on ARTMAP neural |
568 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search | 573 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search |
569 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs | 574 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs |
570 ~\citep{Milgram+al-2005}. More detailed and complete numerical results | 575 ~\citep{Milgram+al-2005}. More detailed and complete numerical results |
571 (figures and tables, including standard errors on the error rates) can be | 576 (figures and tables, including standard errors on the error rates) can be |
572 found in Appendix I of the supplementary material. The 3 kinds of model differ in the | 577 found in Appendix I of the supplementary material. |
573 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 | 578 The deep learner not only outperformed the shallow ones and |
574 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and | |
575 previously published performance (in a statistically and qualitatively | 579 previously published performance (in a statistically and qualitatively |
576 significant way) but reaches human performance on both the 62-class task | 580 significant way) but when trained with perturbed data |
581 reaches human performance on both the 62-class task | |
577 and the 10-class (digits) task. | 582 and the 10-class (digits) task. |
578 | 583 |
579 \begin{figure}[ht] | 584 \begin{figure}[ht] |
580 \vspace*{-2mm} | 585 \vspace*{-2mm} |
581 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} | 586 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} |