Mercurial > ift6266
annotate writeup/aistats_review_response.txt @ 623:d44c78c90669
entered revisions for AMT and SVMs
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sun, 09 Jan 2011 22:00:39 -0500 |
parents | 5c67f674d724 |
children | 49933073590c |
rev | line source |
---|---|
616
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
1 |
619 | 2 We thank the authors for their thoughtful comments. Please find our responses below. |
616
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
3 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
4 * Comparisons with shallower networks, but using unsupervised pre-training: |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
5 e will add those results to the paper. Previous work in our group with |
620
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
6 very similar data (the InfiniteMNIST dataset were published in JMLR in 2010 |
616
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
8 show improvement when going from 1 to 2 and then 3 layers, even when using |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
9 unsupervised pre-training (RBM or Denoising Auto-Encoder). |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
10 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
12 of course is the size of the training set. One option is to use a non-linear SVM |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
13 with a reduced training set, and the other is to use an online linear SVM. |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
14 Another option we have considered is to project the input non-linearly in a |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
15 high-dimensional but sparse representation and then use an online linear SVM on that space. |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
16 For this experiment we have thresholded input pixel gray levels considered a |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
17 low-order polynomial expansion (e.g. only looking at pairs of non-zero pixels). |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
18 We have obtained the following results until now, all substantially worse than those |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
19 obtained with the MLP and deep nets. |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
20 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
21 SVM type training set input online validation test set |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
22 type / size features training set error error |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
23 error |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
24 Linear SVM, NIST, 651k, original, 36.62%, 34.41%, 42.26% |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
25 Linear SVM, NIST, 651k, sparse quadratic, 30.96%, 28.00%, 41.28% |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
26 Linear SVM, NISTP, 800k, original, 88.50%, 85.24%, 87.36% |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
27 Linear SVM, NISTP, 800k, sparse quadratic, 81.76%, 83.69%, 85.56% |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
28 RBF SVM, NISTP, 100k, original, 74.73%, 56.57%, 64.22% |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
29 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
30 The best results were obtained with the sparse quadratic input features, and |
623
d44c78c90669
entered revisions for AMT and SVMs
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
620
diff
changeset
|
31 training on the CLEAN data (NIST) rather than the perturbed data (NISTP). |
d44c78c90669
entered revisions for AMT and SVMs
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
620
diff
changeset
|
32 A summary of the above results was added to the revised paper. |
616
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
33 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
34 |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
35 * Using distorted characters as the corruption process of the Denoising |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
36 Auto-Encoder (DAE). We had already performed preliminary experiments with this idea |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
37 and it did not work very well (in fact it depends on the kind of distortion |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
38 considered), i.e., it did not improve on the simpler forms of noise we used |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
39 for the AISTATS submission. We have several interpretations for this, which should |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
40 probably go (along with more extensive simulations) into another paper. |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
41 The main interpretation for those results is that the DAE learns good |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
42 features by being given as target (to reconstruct) a pattern of higher |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
43 density (according to the unknown, underlying generating distribution) than |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
44 the network input. This is how it gets to know where the density should |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
45 concentrate. Hence distortions that are *plausible* in the input distribution |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
46 (such as translation, rotation, scaling, etc.) are not very useful, whereas |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
47 corruption due to a form of noise are useful. In fact, the most useful |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
48 is a very simple form of noise, that guarantees that the input is much |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
49 less likely than the target, such as Gaussian noise. Another way to think |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
50 about it is to consider the symmetries involved. A corruption process should |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
51 be such that swapping input for target should be very unlikely: this is |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
52 true for many kinds of noises, but not for geometric transformations |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
53 and deformations. |
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
54 |
618
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
55 * Human labeling: We controlled noise in the labelling process by (1) |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
56 requiring AMT workers with a higher than normal average of accepted |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
57 responses (>95%) on other tasks (2) discarding responses that were not |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
58 complete (10 predictions) (3) discarding responses for which for which the |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
59 time to predict was smaller than 3 seconds for NIST (the mean response time |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
60 was 20 seconds) and 6 seconds seconds for NISTP (average response time of |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
61 45 seconds) (4) discarding responses which were obviously wrong (10 |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
62 identical ones, or "12345..."). Overall, after such filtering, we kept |
623
d44c78c90669
entered revisions for AMT and SVMs
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
620
diff
changeset
|
63 approximately 95% of the AMT workers' responses. The above paragraph |
d44c78c90669
entered revisions for AMT and SVMs
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
620
diff
changeset
|
64 was added to the revision. We thank the reviewer for |
618
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
65 the suggestion about multi-stage questionnaires, we will definitely |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
66 consider this as an option next time we perform this experiment. However, |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
67 to be fair, if we were to do so, we should also consider the same |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
68 multi-stage decision process for the machine learning algorithms as well. |
14ba0120baff
review response changes
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
616
diff
changeset
|
69 |
620
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
70 * Size of labeled set: in our JMLR 2010 paper on deep learning (cited |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
71 above), we already verified the effect of number of labeled examples on the |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
72 deep learners and shallow learners (with or without unsupervised |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
73 pre-training); see fig. 11 of that paper, which involves data very similar |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
74 to those studied here. Basically (and somewhat surprisingly) the deep |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
75 learners with unsupervised pre-training can take more advantage of a large |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
76 amount of labeled examples, presumably because of the initialization effect |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
77 (that benefits from the prior that representations that are useful for P(X) |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
78 are also useful for P(Y|X)), and the effect does not disappear when the |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
79 number of labeled examples increases. Other work in the semi-supervised |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
80 setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
81 that the advantage of unsupervised feature learning by a deep architecture |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
82 is most pronounced in the semi-supervised setting with very few labeled |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
83 examples. Adding the training curve in the self-taught settings of this AISTAT |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
84 submission is a good idea, but probably unlikely to provide results |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
85 different from the above already reported in the literature in similar |
5c67f674d724
more changes to rebuttal
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
619
diff
changeset
|
86 settings. |
616
b0cdd200b2bd
added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff
changeset
|
87 |