Mercurial > ift6266
comparison writeup/aistats_review_response.txt @ 620:5c67f674d724
more changes to rebuttal
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sun, 09 Jan 2011 14:35:03 -0500 |
parents | ea31fee25147 |
children | d44c78c90669 |
comparison
equal
deleted
inserted
replaced
619:ea31fee25147 | 620:5c67f674d724 |
---|---|
1 | 1 |
2 We thank the authors for their thoughtful comments. Please find our responses below. | 2 We thank the authors for their thoughtful comments. Please find our responses below. |
3 | 3 |
4 * Comparisons with shallower networks, but using unsupervised pre-training: | 4 * Comparisons with shallower networks, but using unsupervised pre-training: |
5 e will add those results to the paper. Previous work in our group with | 5 e will add those results to the paper. Previous work in our group with |
6 very similar data (the InfiniteMNIST dataset were published in JMLR in 20102 | 6 very similar data (the InfiniteMNIST dataset were published in JMLR in 2010 |
7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed | 7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed |
8 show improvement when going from 1 to 2 and then 3 layers, even when using | 8 show improvement when going from 1 to 2 and then 3 layers, even when using |
9 unsupervised pre-training (RBM or Denoising Auto-Encoder). | 9 unsupervised pre-training (RBM or Denoising Auto-Encoder). |
10 | 10 |
11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation | 11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation |
63 the suggestion about multi-stage questionnaires, we will definitely | 63 the suggestion about multi-stage questionnaires, we will definitely |
64 consider this as an option next time we perform this experiment. However, | 64 consider this as an option next time we perform this experiment. However, |
65 to be fair, if we were to do so, we should also consider the same | 65 to be fair, if we were to do so, we should also consider the same |
66 multi-stage decision process for the machine learning algorithms as well. | 66 multi-stage decision process for the machine learning algorithms as well. |
67 | 67 |
68 * Size of labeled set: in our JMLR 2010 paper on deep learning (cited | |
69 above), we already verified the effect of number of labeled examples on the | |
70 deep learners and shallow learners (with or without unsupervised | |
71 pre-training); see fig. 11 of that paper, which involves data very similar | |
72 to those studied here. Basically (and somewhat surprisingly) the deep | |
73 learners with unsupervised pre-training can take more advantage of a large | |
74 amount of labeled examples, presumably because of the initialization effect | |
75 (that benefits from the prior that representations that are useful for P(X) | |
76 are also useful for P(Y|X)), and the effect does not disappear when the | |
77 number of labeled examples increases. Other work in the semi-supervised | |
78 setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show | |
79 that the advantage of unsupervised feature learning by a deep architecture | |
80 is most pronounced in the semi-supervised setting with very few labeled | |
81 examples. Adding the training curve in the self-taught settings of this AISTAT | |
82 submission is a good idea, but probably unlikely to provide results | |
83 different from the above already reported in the literature in similar | |
84 settings. | |
68 | 85 |
69 * Size of labeled set: | |
70 |