# HG changeset patch # User Yoshua Bengio # Date 1294601703 18000 # Node ID 5c67f674d724e4f2121fdf6de85a9de76224ec06 # Parent ea31fee25147ab2e0394cf5c00e478b009c214d2 more changes to rebuttal diff -r ea31fee25147 -r 5c67f674d724 writeup/aistats_review_response.txt --- a/writeup/aistats_review_response.txt Sun Jan 09 14:15:04 2011 -0500 +++ b/writeup/aistats_review_response.txt Sun Jan 09 14:35:03 2011 -0500 @@ -3,7 +3,7 @@ * Comparisons with shallower networks, but using unsupervised pre-training: e will add those results to the paper. Previous work in our group with -very similar data (the InfiniteMNIST dataset were published in JMLR in 20102 +very similar data (the InfiniteMNIST dataset were published in JMLR in 2010 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed show improvement when going from 1 to 2 and then 3 layers, even when using unsupervised pre-training (RBM or Denoising Auto-Encoder). @@ -65,6 +65,21 @@ to be fair, if we were to do so, we should also consider the same multi-stage decision process for the machine learning algorithms as well. +* Size of labeled set: in our JMLR 2010 paper on deep learning (cited +above), we already verified the effect of number of labeled examples on the +deep learners and shallow learners (with or without unsupervised +pre-training); see fig. 11 of that paper, which involves data very similar +to those studied here. Basically (and somewhat surprisingly) the deep +learners with unsupervised pre-training can take more advantage of a large +amount of labeled examples, presumably because of the initialization effect +(that benefits from the prior that representations that are useful for P(X) +are also useful for P(Y|X)), and the effect does not disappear when the +number of labeled examples increases. Other work in the semi-supervised +setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show +that the advantage of unsupervised feature learning by a deep architecture +is most pronounced in the semi-supervised setting with very few labeled +examples. Adding the training curve in the self-taught settings of this AISTAT +submission is a good idea, but probably unlikely to provide results +different from the above already reported in the literature in similar +settings. -* Size of labeled set: -