comparison writeup/nips2010_submission.tex @ 535:caf7769ca19c

typo
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 22:25:35 -0400
parents 4d6493d171f6
children 84f42fe05594
comparison
equal deleted inserted replaced
534:4d6493d171f6 535:caf7769ca19c
355 with 60~000 examples, and variants involving 10~000 355 with 60~000 examples, and variants involving 10~000
356 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want 356 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
357 to focus here on the case of much larger training sets, from 10 times to 357 to focus here on the case of much larger training sets, from 10 times to
358 to 1000 times larger. 358 to 1000 times larger.
359 359
360 The first step in constructing the larger datasets is to sample from 360 The first step in constructing the larger datasets (called NISTP and P07) is to sample from
361 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, 361 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
362 and {\bf OCR data} (scanned machine printed characters). Once a character 362 and {\bf OCR data} (scanned machine printed characters). Once a character
363 is sampled from one of these sources (chosen randomly), the pipeline of 363 is sampled from one of these sources (chosen randomly), the pipeline of
364 the transformations and/or noise processes described in section \ref{s:perturbations} 364 the transformations and/or noise processes described in section \ref{s:perturbations}
365 is applied to the image. 365 is applied to the image.
366 366
367 We compare the best MLP against 367 We compare the best MLPs against
368 the best SDA (both models' hyper-parameters are selected to minimize the validation set error), 368 the best SDAs (both models' hyper-parameters are selected to minimize the validation set error),
369 along with a comparison against a precise estimate 369 along with a comparison against a precise estimate
370 of human performance obtained via Amazon's Mechanical Turk (AMT) 370 of human performance obtained via Amazon's Mechanical Turk (AMT)
371 service (http://mturk.com). 371 service (http://mturk.com).
372 AMT users are paid small amounts 372 AMT users are paid small amounts
373 of money to perform tasks for which human intelligence is required. 373 of money to perform tasks for which human intelligence is required.
444 from one of the 62 character classes. 444 from one of the 62 character classes.
445 %\begin{itemize} 445 %\begin{itemize}
446 446
447 %\item 447 %\item
448 {\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}. It has 448 {\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}. It has
449 \{651668 / 80000 / 82587\} \{training / validation / test} examples. 449 \{651668 / 80000 / 82587\} \{training / validation / test\} examples.
450 450
451 %\item 451 %\item
452 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources 452 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources
453 and sending them through the transformation pipeline described in section \ref{s:perturbations}. 453 and sending them through the transformation pipeline described in section \ref{s:perturbations}.
454 For each new example to generate, a data source is selected with probability $10\%$ from the fonts, 454 For each new example to generate, a data source is selected with probability $10\%$ from the fonts,
455 $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the 455 $25\%$ from the captchas, $25\%$ from the OCR data and $40\%$ from NIST. We apply all the transformations in the
456 order given above, and for each of them we sample uniformly a \emph{complexity} in the range $[0,0.7]$. 456 order given above, and for each of them we sample uniformly a \emph{complexity} in the range $[0,0.7]$.
457 It has \{81920000 / 80000 / 20000\} \{training / validation / test} examples. 457 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples.
458 458
459 %\item 459 %\item
460 {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same proportions of data sources) 460 {\bf NISTP.} This one is equivalent to P07 (complexity parameter of $0.7$ with the same proportions of data sources)
461 except that we only apply 461 except that we only apply
462 transformations from slant to pinch. Therefore, the character is 462 transformations from slant to pinch. Therefore, the character is
463 transformed but no additional noise is added to the image, giving images 463 transformed but no additional noise is added to the image, giving images
464 closer to the NIST dataset. 464 closer to the NIST dataset.
465 It has \{81920000 / 80000 / 20000\} \{training / validation / test} examples. 465 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples.
466 %\end{itemize} 466 %\end{itemize}
467 467
468 \vspace*{-1mm} 468 \vspace*{-1mm}
469 \subsection{Models and their Hyperparameters} 469 \subsection{Models and their Hyperparameters}
470 \vspace*{-1mm} 470 \vspace*{-1mm}
523 comparable or better than RBMs in series of experiments 523 comparable or better than RBMs in series of experiments
524 \citep{VincentPLarochelleH2008}. During training, a Denoising 524 \citep{VincentPLarochelleH2008}. During training, a Denoising
525 Auto-Encoder is presented with a stochastically corrupted version 525 Auto-Encoder is presented with a stochastically corrupted version
526 of the input and trained to reconstruct the uncorrupted input, 526 of the input and trained to reconstruct the uncorrupted input,
527 forcing the hidden units to represent the leading regularities in 527 forcing the hidden units to represent the leading regularities in
528 the data. Once it is trained, its hidden units activations can 528 the data. Once it is trained, in a purely unsupervised way,
529 its hidden units activations can
529 be used as inputs for training a second one, etc. 530 be used as inputs for training a second one, etc.
530 After this unsupervised pre-training stage, the parameters 531 After this unsupervised pre-training stage, the parameters
531 are used to initialize a deep MLP, which is fine-tuned by 532 are used to initialize a deep MLP, which is fine-tuned by
532 the same standard procedure used to train them (see previous section). 533 the same standard procedure used to train them (see previous section).
533 The SDA hyper-parameters are the same as for the MLP, with the addition of the 534 The SDA hyper-parameters are the same as for the MLP, with the addition of the
558 \section{Experimental Results} 559 \section{Experimental Results}
559 560
560 %\vspace*{-1mm} 561 %\vspace*{-1mm}
561 %\subsection{SDA vs MLP vs Humans} 562 %\subsection{SDA vs MLP vs Humans}
562 %\vspace*{-1mm} 563 %\vspace*{-1mm}
563 564 The models are either trained on NIST (MLP0 and SDA0),
565 NISTP (MLP1 and SDA1), or P07 (MLP2 and SDA2), and tested
566 on either NIST, NISTP or P07, either on all 62 classes
567 or only on the digits (considering only the outputs
568 associated with digit classes).
564 Figure~\ref{fig:error-rates-charts} summarizes the results obtained, 569 Figure~\ref{fig:error-rates-charts} summarizes the results obtained,
565 comparing Humans, three MLPs (MLP0, MLP1, MLP2) and three SDAs (SDA0, SDA1, 570 comparing Humans, the three MLPs (MLP0, MLP1, MLP2) and the three SDAs (SDA0, SDA1,
566 SDA2), along with the previous results on the digits NIST special database 571 SDA2), along with the previous results on the digits NIST special database
567 19 test set from the literature respectively based on ARTMAP neural 572 19 test set from the literature respectively based on ARTMAP neural
568 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search 573 networks ~\citep{Granger+al-2007}, fast nearest-neighbor search
569 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs 574 ~\citep{Cortes+al-2000}, MLPs ~\citep{Oliveira+al-2002-short}, and SVMs
570 ~\citep{Milgram+al-2005}. More detailed and complete numerical results 575 ~\citep{Milgram+al-2005}. More detailed and complete numerical results
571 (figures and tables, including standard errors on the error rates) can be 576 (figures and tables, including standard errors on the error rates) can be
572 found in Appendix I of the supplementary material. The 3 kinds of model differ in the 577 found in Appendix I of the supplementary material.
573 training sets used: NIST only (MLP0,SDA0), NISTP (MLP1, SDA1), or P07 578 The deep learner not only outperformed the shallow ones and
574 (MLP2, SDA2). The deep learner not only outperformed the shallow ones and
575 previously published performance (in a statistically and qualitatively 579 previously published performance (in a statistically and qualitatively
576 significant way) but reaches human performance on both the 62-class task 580 significant way) but when trained with perturbed data
581 reaches human performance on both the 62-class task
577 and the 10-class (digits) task. 582 and the 10-class (digits) task.
578 583
579 \begin{figure}[ht] 584 \begin{figure}[ht]
580 \vspace*{-2mm} 585 \vspace*{-2mm}
581 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} 586 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}}