Mercurial > ift6266
comparison writeup/techreport.tex @ 458:c0f738f0cef0
added many results
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Thu, 27 May 2010 08:29:04 -0600 |
parents | b0622f78cfec |
children | fe292653a0f8 |
comparison
equal
deleted
inserted
replaced
454:df56627d5399 | 458:c0f738f0cef0 |
---|---|
370 characters. Hence they were forced to make a hard choice among the | 370 characters. Hence they were forced to make a hard choice among the |
371 62 character classes. Three users classified each image, allowing | 371 62 character classes. Three users classified each image, allowing |
372 to estimate inter-human variability (shown as +/- in parenthesis below). | 372 to estimate inter-human variability (shown as +/- in parenthesis below). |
373 | 373 |
374 \begin{table} | 374 \begin{table} |
375 \caption{Overall comparison of error rates on 62 character classes (10 digits + | 375 \caption{Overall comparison of error rates ($\pm$ std.err.) on 62 character classes (10 digits + |
376 26 lower + 26 upper), except for last columns -- digits only, between deep architecture with pre-training | 376 26 lower + 26 upper), except for last columns -- digits only, between deep architecture with pre-training |
377 (SDA=Stacked Denoising Autoencoder) and ordinary shallow architecture | 377 (SDA=Stacked Denoising Autoencoder) and ordinary shallow architecture |
378 (MLP=Multi-Layer Perceptron). } | 378 (MLP=Multi-Layer Perceptron). The models shown are all trained using perturbed data (NISTP or P07) |
379 and using a validation set to select hyper-parameters and other training choices. | |
380 \{SDA,MLP\}0 are trained on NIST, | |
381 \{SDA,MLP\}1 are trained on NISTP, and \{SDA,MLP\}2 are trained on P07. | |
382 The human error rate on digits is a lower bound because it does not count digits that were | |
383 recognized as letters.} | |
379 \label{tab:sda-vs-mlp-vs-humans} | 384 \label{tab:sda-vs-mlp-vs-humans} |
380 \begin{center} | 385 \begin{center} |
381 \begin{tabular}{|l|r|r|r|r|} \hline | 386 \begin{tabular}{|l|r|r|r|r|} \hline |
382 & NIST test & NISTP test & P07 test & NIST test digits \\ \hline | 387 & NIST test & NISTP test & P07 test & NIST test digits \\ \hline |
383 Humans& & & & \\ \hline | 388 Humans& 18.2\% $\pm$.1\% & 39.4\%$\pm$.1\% & 46.9\%$\pm$.1\% & $>1.1\%$ \\ \hline |
384 SDA & & & &\\ \hline | 389 SDA0 & 23.7\% $\pm$.14\% & 65.2\%$\pm$.34\% & 97.45\%$\pm$.06\% & 2.7\% $\pm$.14\%\\ \hline |
385 MLP & & & & \\ \hline | 390 SDA1 & 17.1\% $\pm$.13\% & 29.7\%$\pm$.3\% & 29.7\%$\pm$.3\% & 1.4\% $\pm$.1\%\\ \hline |
391 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline | |
392 MLP0 & 24.2\% $\pm$.15\% & \%$\pm$.35\% & \%$\pm$.1\% & 3.45\% $\pm$.16\% \\ \hline | |
393 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline | |
394 MLP2 & ?\% $\pm$.15\% & ?\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline | |
386 \end{tabular} | 395 \end{tabular} |
387 \end{center} | 396 \end{center} |
388 \end{table} | 397 \end{table} |
389 | 398 |
390 \subsection{Perturbed Training Data More Helpful for SDAE} | 399 \subsection{Perturbed Training Data More Helpful for SDAE} |
391 | 400 |
401 | |
392 \subsection{Training with More Classes than Necessary} | 402 \subsection{Training with More Classes than Necessary} |
393 | 403 |
394 As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST. | 404 As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST. |
395 | 405 |
396 Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors is 12\% smaller, 27\% smaller on digits, and 15\% smaller on lower case characters). | 406 Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors is 12\% smaller, 27\% smaller on digits, and 15\% smaller on lower case characters). |