ift6266: writeup/nips2010_submission.tex comparison

added paragraph comparing "our" self-taught learning with "theirs"

author	Dumitru Erhan <dumitru.erhan@gmail.com>
date	Tue, 01 Jun 2010 14:06:43 -0700
parents	c778d20ab6f8
children	4354c3c8f49c 8fe77eac344f

comparison

equal deleted inserted replaced

-:c778d20ab6f8
+:07bc0ca8d246
 Whereas the improvement due to the multi-task setting was marginal or
 negative for the MLP (from +5.6\% to -3.6\% relative change),
 it was very significant for the SDA (from +13\% to +27\% relative change).
 %\end{itemize}
+In the original self-taught learning framework~\citep{RainaR2007}, the
+out-of-sample examples were used as a source of unsupervised data, and
+experiments showed its positive effects in a \emph{limited labeled data}
+scenario. However, many of the results by \citet{RainaR2007} (who used a
+shallow, sparse coding approach) suggest that the relative gain of self-taught
+learning diminishes as the number of labeled examples increases, (essentially,
+a ``diminishing returns'' scenario occurs).  We note that, for deep
+architectures, our experiments show that such a positive effect is accomplished
+even in a scenario with a \emph{very large number of labeled examples}.
 Why would deep learners benefit more from the self-taught learning framework?
 The key idea is that the lower layers of the predictor compute a hierarchy
 of features that can be shared across tasks or across variants of the
 input distribution. Intermediate features that can be used in different
 contexts can be estimated in a way that allows to share statistical

Mercurial > ift6266