ift6266: writeup/aistats_review_response.txt annotate

annotate writeup/aistats_review_response.txt @ 621:e162e75ac5c6

merge

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sun, 09 Jan 2011 21:33:55 -0500
parents	5c67f674d724
children	d44c78c90669

rev	line source
616 b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	1
619 ea31fee25147 review response Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 618 diff changeset	2 We thank the authors for their thoughtful comments. Please find our responses below.
616 b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	3
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	4 * Comparisons with shallower networks, but using unsupervised pre-training:
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	5 e will add those results to the paper. Previous work in our group with
620 5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	6 very similar data (the InfiniteMNIST dataset were published in JMLR in 2010
616 b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	7 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	8 show improvement when going from 1 to 2 and then 3 layers, even when using
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	9 unsupervised pre-training (RBM or Denoising Auto-Encoder).
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	10
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	11 * Comparisons with SVMs. We have tried several kinds of SVMs. The main limitation
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	12 of course is the size of the training set. One option is to use a non-linear SVM
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	13 with a reduced training set, and the other is to use an online linear SVM.
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	14 Another option we have considered is to project the input non-linearly in a
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	15 high-dimensional but sparse representation and then use an online linear SVM on that space.
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	16 For this experiment we have thresholded input pixel gray levels considered a
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	17 low-order polynomial expansion (e.g. only looking at pairs of non-zero pixels).
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	18 We have obtained the following results until now, all substantially worse than those
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	19 obtained with the MLP and deep nets.
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	20
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	21 SVM type training set input online validation test set
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	22 type / size features training set error error
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	23 error
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	24 Linear SVM, NIST, 651k, original, 36.62%, 34.41%, 42.26%
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	25 Linear SVM, NIST, 651k, sparse quadratic, 30.96%, 28.00%, 41.28%
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	26 Linear SVM, NISTP, 800k, original, 88.50%, 85.24%, 87.36%
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	27 Linear SVM, NISTP, 800k, sparse quadratic, 81.76%, 83.69%, 85.56%
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	28 RBF SVM, NISTP, 100k, original, 74.73%, 56.57%, 64.22%
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	29
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	30 The best results were obtained with the sparse quadratic input features, and
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	31 training on the CLEAN data (NIST) rather than the perturbed data (NISTP).
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	32
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	33
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	34 * Using distorted characters as the corruption process of the Denoising
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	35 Auto-Encoder (DAE). We had already performed preliminary experiments with this idea
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	36 and it did not work very well (in fact it depends on the kind of distortion
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	37 considered), i.e., it did not improve on the simpler forms of noise we used
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	38 for the AISTATS submission. We have several interpretations for this, which should
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	39 probably go (along with more extensive simulations) into another paper.
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	40 The main interpretation for those results is that the DAE learns good
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	41 features by being given as target (to reconstruct) a pattern of higher
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	42 density (according to the unknown, underlying generating distribution) than
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	43 the network input. This is how it gets to know where the density should
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	44 concentrate. Hence distortions that are plausible in the input distribution
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	45 (such as translation, rotation, scaling, etc.) are not very useful, whereas
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	46 corruption due to a form of noise are useful. In fact, the most useful
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	47 is a very simple form of noise, that guarantees that the input is much
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	48 less likely than the target, such as Gaussian noise. Another way to think
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	49 about it is to consider the symmetries involved. A corruption process should
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	50 be such that swapping input for target should be very unlikely: this is
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	51 true for many kinds of noises, but not for geometric transformations
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	52 and deformations.
b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	53
618 14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	54 * Human labeling: We controlled noise in the labelling process by (1)
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	55 requiring AMT workers with a higher than normal average of accepted
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	56 responses (>95%) on other tasks (2) discarding responses that were not
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	57 complete (10 predictions) (3) discarding responses for which for which the
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	58 time to predict was smaller than 3 seconds for NIST (the mean response time
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	59 was 20 seconds) and 6 seconds seconds for NISTP (average response time of
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	60 45 seconds) (4) discarding responses which were obviously wrong (10
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	61 identical ones, or "12345..."). Overall, after such filtering, we kept
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	62 approximately 95% of the AMT workers' responses. We thank the reviewer for
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	63 the suggestion about multi-stage questionnaires, we will definitely
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	64 consider this as an option next time we perform this experiment. However,
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	65 to be fair, if we were to do so, we should also consider the same
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	66 multi-stage decision process for the machine learning algorithms as well.
14ba0120baff review response changes Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 616 diff changeset	67
620 5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	68 * Size of labeled set: in our JMLR 2010 paper on deep learning (cited
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	69 above), we already verified the effect of number of labeled examples on the
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	70 deep learners and shallow learners (with or without unsupervised
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	71 pre-training); see fig. 11 of that paper, which involves data very similar
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	72 to those studied here. Basically (and somewhat surprisingly) the deep
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	73 learners with unsupervised pre-training can take more advantage of a large
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	74 amount of labeled examples, presumably because of the initialization effect
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	75 (that benefits from the prior that representations that are useful for P(X)
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	76 are also useful for P(Y\|X)), and the effect does not disappear when the
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	77 number of labeled examples increases. Other work in the semi-supervised
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	78 setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	79 that the advantage of unsupervised feature learning by a deep architecture
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	80 is most pronounced in the semi-supervised setting with very few labeled
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	81 examples. Adding the training curve in the self-taught settings of this AISTAT
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	82 submission is a good idea, but probably unlikely to provide results
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	83 different from the above already reported in the literature in similar
5c67f674d724 more changes to rebuttal Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 619 diff changeset	84 settings.
616 b0cdd200b2bd added aistats_review_response.txt Yoshua Bengio <bengioy@iro.umontreal.ca> parents: diff changeset	85

Mercurial > ift6266

annotate writeup/aistats_review_response.txt @ 621:e162e75ac5c6