Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 524:07bc0ca8d246
added paragraph comparing "our" self-taught learning with "theirs"
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Tue, 01 Jun 2010 14:06:43 -0700 |
parents | c778d20ab6f8 |
children | 4354c3c8f49c 8fe77eac344f |
comparison
equal
deleted
inserted
replaced
523:c778d20ab6f8 | 524:07bc0ca8d246 |
---|---|
686 Whereas the improvement due to the multi-task setting was marginal or | 686 Whereas the improvement due to the multi-task setting was marginal or |
687 negative for the MLP (from +5.6\% to -3.6\% relative change), | 687 negative for the MLP (from +5.6\% to -3.6\% relative change), |
688 it was very significant for the SDA (from +13\% to +27\% relative change). | 688 it was very significant for the SDA (from +13\% to +27\% relative change). |
689 %\end{itemize} | 689 %\end{itemize} |
690 | 690 |
691 In the original self-taught learning framework~\citep{RainaR2007}, the | |
692 out-of-sample examples were used as a source of unsupervised data, and | |
693 experiments showed its positive effects in a \emph{limited labeled data} | |
694 scenario. However, many of the results by \citet{RainaR2007} (who used a | |
695 shallow, sparse coding approach) suggest that the relative gain of self-taught | |
696 learning diminishes as the number of labeled examples increases, (essentially, | |
697 a ``diminishing returns'' scenario occurs). We note that, for deep | |
698 architectures, our experiments show that such a positive effect is accomplished | |
699 even in a scenario with a \emph{very large number of labeled examples}. | |
700 | |
691 Why would deep learners benefit more from the self-taught learning framework? | 701 Why would deep learners benefit more from the self-taught learning framework? |
692 The key idea is that the lower layers of the predictor compute a hierarchy | 702 The key idea is that the lower layers of the predictor compute a hierarchy |
693 of features that can be shared across tasks or across variants of the | 703 of features that can be shared across tasks or across variants of the |
694 input distribution. Intermediate features that can be used in different | 704 input distribution. Intermediate features that can be used in different |
695 contexts can be estimated in a way that allows to share statistical | 705 contexts can be estimated in a way that allows to share statistical |