ift6266: writeup/aistats_review_response.txt comparison

comparison writeup/aistats_review_response.txt @ 620:5c67f674d724

more changes to rebuttal

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sun, 09 Jan 2011 14:35:03 -0500
parents	ea31fee25147
children	d44c78c90669

comparison

equal deleted inserted replaced

-:ea31fee25147
+:5c67f674d724
 We thank the authors for their thoughtful comments. Please find our responses below.
 * Comparisons with shallower networks, but using unsupervised pre-training:
 e will add those results to the paper. Previous work in our group with
-very similar data (the InfiniteMNIST dataset were published in JMLR in 20102
+very similar data (the InfiniteMNIST dataset were published in JMLR in 2010
 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed
 show improvement when going from 1 to 2 and then 3 layers, even when using
 unsupervised pre-training (RBM or Denoising Auto-Encoder).
 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation
 the suggestion about multi-stage questionnaires, we will definitely
 consider this as an option next time we perform this experiment. However,
 to be fair, if we were to do so, we should also consider the same
 multi-stage decision process for the machine learning algorithms as well.
+* Size of labeled set: in our JMLR 2010 paper on deep learning (cited
+above), we already verified the effect of number of labeled examples on the
+deep learners and shallow learners (with or without unsupervised
+pre-training); see fig. 11 of that paper, which involves data very similar
+to those studied here. Basically (and somewhat surprisingly) the deep
+learners with unsupervised pre-training can take more advantage of a large
+amount of labeled examples, presumably because of the initialization effect
+(that benefits from the prior that representations that are useful for P(X)
+are also useful for P(Y|X)), and the effect does not disappear when the
+number of labeled examples increases. Other work in the semi-supervised
+setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show
+that the advantage of unsupervised feature learning by a deep architecture
+is most pronounced in the semi-supervised setting with very few labeled
+examples. Adding the training curve in the self-taught settings of this AISTAT
+submission is a good idea, but probably unlikely to provide results
+different from the above already reported in the literature in similar
+settings.
-* Size of labeled set:

Mercurial > ift6266

comparison writeup/aistats_review_response.txt @ 620:5c67f674d724