comparison writeup/aistats_review_response.txt @ 620:5c67f674d724

more changes to rebuttal
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sun, 09 Jan 2011 14:35:03 -0500
parents ea31fee25147
children d44c78c90669
comparison
equal deleted inserted replaced
619:ea31fee25147 620:5c67f674d724
1 1
2 We thank the authors for their thoughtful comments. Please find our responses below. 2 We thank the authors for their thoughtful comments. Please find our responses below.
3 3
4 * Comparisons with shallower networks, but using unsupervised pre-training: 4 * Comparisons with shallower networks, but using unsupervised pre-training:
5 e will add those results to the paper. Previous work in our group with 5 e will add those results to the paper. Previous work in our group with
6 very similar data (the InfiniteMNIST dataset were published in JMLR in 20102 6 very similar data (the InfiniteMNIST dataset were published in JMLR in 2010
7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed 7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed
8 show improvement when going from 1 to 2 and then 3 layers, even when using 8 show improvement when going from 1 to 2 and then 3 layers, even when using
9 unsupervised pre-training (RBM or Denoising Auto-Encoder). 9 unsupervised pre-training (RBM or Denoising Auto-Encoder).
10 10
11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation 11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation
63 the suggestion about multi-stage questionnaires, we will definitely 63 the suggestion about multi-stage questionnaires, we will definitely
64 consider this as an option next time we perform this experiment. However, 64 consider this as an option next time we perform this experiment. However,
65 to be fair, if we were to do so, we should also consider the same 65 to be fair, if we were to do so, we should also consider the same
66 multi-stage decision process for the machine learning algorithms as well. 66 multi-stage decision process for the machine learning algorithms as well.
67 67
68 * Size of labeled set: in our JMLR 2010 paper on deep learning (cited
69 above), we already verified the effect of number of labeled examples on the
70 deep learners and shallow learners (with or without unsupervised
71 pre-training); see fig. 11 of that paper, which involves data very similar
72 to those studied here. Basically (and somewhat surprisingly) the deep
73 learners with unsupervised pre-training can take more advantage of a large
74 amount of labeled examples, presumably because of the initialization effect
75 (that benefits from the prior that representations that are useful for P(X)
76 are also useful for P(Y|X)), and the effect does not disappear when the
77 number of labeled examples increases. Other work in the semi-supervised
78 setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show
79 that the advantage of unsupervised feature learning by a deep architecture
80 is most pronounced in the semi-supervised setting with very few labeled
81 examples. Adding the training curve in the self-taught settings of this AISTAT
82 submission is a good idea, but probably unlikely to provide results
83 different from the above already reported in the literature in similar
84 settings.
68 85
69 * Size of labeled set:
70