annotate writeup/aistats_review_response.txt @ 616:b0cdd200b2bd

added aistats_review_response.txt
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sun, 09 Jan 2011 12:13:45 -0500
parents
children 14ba0120baff
rev   line source
616
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
1
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
2 We thank the authors for their thoughtful comments. Here are some responses.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
3
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
4 * Comparisons with shallower networks, but using unsupervised pre-training:
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
5 e will add those results to the paper. Previous work in our group with
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
6 very similar data (the InfiniteMNIST dataset were published in JMLR in 20102
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
8 show improvement when going from 1 to 2 and then 3 layers, even when using
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
9 unsupervised pre-training (RBM or Denoising Auto-Encoder).
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
10
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
12 of course is the size of the training set. One option is to use a non-linear SVM
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
13 with a reduced training set, and the other is to use an online linear SVM.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
14 Another option we have considered is to project the input non-linearly in a
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
15 high-dimensional but sparse representation and then use an online linear SVM on that space.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
16 For this experiment we have thresholded input pixel gray levels considered a
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
17 low-order polynomial expansion (e.g. only looking at pairs of non-zero pixels).
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
18 We have obtained the following results until now, all substantially worse than those
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
19 obtained with the MLP and deep nets.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
20
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
21 SVM type training set input online validation test set
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
22 type / size features training set error error
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
23 error
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
24 Linear SVM, NIST, 651k, original, 36.62%, 34.41%, 42.26%
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
25 Linear SVM, NIST, 651k, sparse quadratic, 30.96%, 28.00%, 41.28%
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
26 Linear SVM, NISTP, 800k, original, 88.50%, 85.24%, 87.36%
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
27 Linear SVM, NISTP, 800k, sparse quadratic, 81.76%, 83.69%, 85.56%
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
28 RBF SVM, NISTP, 100k, original, 74.73%, 56.57%, 64.22%
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
29
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
30 The best results were obtained with the sparse quadratic input features, and
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
31 training on the CLEAN data (NIST) rather than the perturbed data (NISTP).
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
32
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
33
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
34 * Using distorted characters as the corruption process of the Denoising
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
35 Auto-Encoder (DAE). We had already performed preliminary experiments with this idea
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
36 and it did not work very well (in fact it depends on the kind of distortion
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
37 considered), i.e., it did not improve on the simpler forms of noise we used
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
38 for the AISTATS submission. We have several interpretations for this, which should
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
39 probably go (along with more extensive simulations) into another paper.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
40 The main interpretation for those results is that the DAE learns good
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
41 features by being given as target (to reconstruct) a pattern of higher
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
42 density (according to the unknown, underlying generating distribution) than
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
43 the network input. This is how it gets to know where the density should
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
44 concentrate. Hence distortions that are *plausible* in the input distribution
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
45 (such as translation, rotation, scaling, etc.) are not very useful, whereas
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
46 corruption due to a form of noise are useful. In fact, the most useful
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
47 is a very simple form of noise, that guarantees that the input is much
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
48 less likely than the target, such as Gaussian noise. Another way to think
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
49 about it is to consider the symmetries involved. A corruption process should
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
50 be such that swapping input for target should be very unlikely: this is
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
51 true for many kinds of noises, but not for geometric transformations
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
52 and deformations.
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
53
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
54 * Human labeling:
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
55
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
56 * Size of labeled set:
b0cdd200b2bd added aistats_review_response.txt
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
diff changeset
57