Mercurial > ift6266
comparison writeup/techreport.tex @ 440:89258bb41e4c
updating values in Training with More Classes than Necessary
author | Guillaume Sicard <guitch21@gmail.com> |
---|---|
date | Mon, 03 May 2010 12:18:03 -0400 |
parents | a6d339033d03 |
children | 1272dc84a30c |
comparison
equal
deleted
inserted
replaced
439:5ca2936f2062 | 440:89258bb41e4c |
---|---|
363 | 363 |
364 \subsection{Training with More Classes than Necessary} | 364 \subsection{Training with More Classes than Necessary} |
365 | 365 |
366 As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST. | 366 As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST. |
367 | 367 |
368 Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors are identical, but 27\% smaller on digits, and 9.4\% smaller on lower case characters). | 368 Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors is 12\% smaller, 27\% smaller on digits, and 15\% smaller on lower case characters). |
369 | 369 |
370 \section{Conclusions} | 370 \section{Conclusions} |
371 | 371 |
372 \bibliography{strings,ml,aigaion,specials} | 372 \bibliography{strings,ml,aigaion,specials} |
373 \bibliographystyle{mlapa} | 373 \bibliographystyle{mlapa} |