comparison writeup/nips2010_submission.tex @ 524:07bc0ca8d246

added paragraph comparing "our" self-taught learning with "theirs"
author Dumitru Erhan <dumitru.erhan@gmail.com>
date Tue, 01 Jun 2010 14:06:43 -0700
parents c778d20ab6f8
children 4354c3c8f49c 8fe77eac344f
comparison
equal deleted inserted replaced
523:c778d20ab6f8 524:07bc0ca8d246
686 Whereas the improvement due to the multi-task setting was marginal or 686 Whereas the improvement due to the multi-task setting was marginal or
687 negative for the MLP (from +5.6\% to -3.6\% relative change), 687 negative for the MLP (from +5.6\% to -3.6\% relative change),
688 it was very significant for the SDA (from +13\% to +27\% relative change). 688 it was very significant for the SDA (from +13\% to +27\% relative change).
689 %\end{itemize} 689 %\end{itemize}
690 690
691 In the original self-taught learning framework~\citep{RainaR2007}, the
692 out-of-sample examples were used as a source of unsupervised data, and
693 experiments showed its positive effects in a \emph{limited labeled data}
694 scenario. However, many of the results by \citet{RainaR2007} (who used a
695 shallow, sparse coding approach) suggest that the relative gain of self-taught
696 learning diminishes as the number of labeled examples increases, (essentially,
697 a ``diminishing returns'' scenario occurs). We note that, for deep
698 architectures, our experiments show that such a positive effect is accomplished
699 even in a scenario with a \emph{very large number of labeled examples}.
700
691 Why would deep learners benefit more from the self-taught learning framework? 701 Why would deep learners benefit more from the self-taught learning framework?
692 The key idea is that the lower layers of the predictor compute a hierarchy 702 The key idea is that the lower layers of the predictor compute a hierarchy
693 of features that can be shared across tasks or across variants of the 703 of features that can be shared across tasks or across variants of the
694 input distribution. Intermediate features that can be used in different 704 input distribution. Intermediate features that can be used in different
695 contexts can be estimated in a way that allows to share statistical 705 contexts can be estimated in a way that allows to share statistical