ift6266: writeup/nips_rebuttal

annotate writeup/nips_rebuttal_clean.txt @ 575:bff9ab360ef4

nips_rebuttal_clean

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sat, 07 Aug 2010 22:46:12 -0400
parents	d12b9a1432e8
children	185d79636a20

rev	line source
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	1
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	2 Reviewer_1 claims that handwriting recognition is essentially solved: we
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	3 believe this is not true. Yes the best methods have been getting
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	4 essentially human performance in the case of clean digits. But we are not
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	5 aware of previous papers achieving human performance on the full character
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	6 set. It is clear from our own experimentation (play with the demo to
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	7 convince yourself) that humans still clearly outperform machines when the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	8 characters are heavily distorted (e.g. as in our NISTP dataset).
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	9
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	10
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	11 "...not intended to compete with the state-of-the-art...": We had included
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	12 comparisons with the state-of-the-art on the NIST dataset (and beat it).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	13
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	14
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	15 "the demonstrations that self-taught learning can help deep learners is
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	16 helpful": indeed, but it is even more interesting to consider the result
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	17 that self-taught learning was found *more helpful for deep learners than
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	18 for shallow ones*. Since out-of-distribution data is common (especially
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	19 out-of-class data), this is of practical importance.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	20
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	22 SVMs cannot be used on such large datasets. We will explore SVM variants
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	23 such as the suggestion made to add SVM results to the paper.
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	24
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	25
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	26 "...it would be helpful to provide some theoretical analysis...": indeed,
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	27 but this is either mathematically challenging (to say the least, since deep
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	28 models involve a non-convex optimization) or would likely require very
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	29 strong assumptions on the data distribution. However, there exists
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	30 theoretical literature which answers some basic questions about this issue,
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	31 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	32 representations". The argument is about capacity and sharing it across
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	33 tasks so as to achieve better generalization. The lower layers implement
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	34 features that can potentially be shared across tasks. As long as some
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	35 sharing is possible (because the same features can be useful for several
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	36 tasks), then there is a potential benefit from shared internal
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	37 representations. Whereas a one-hidden-layer MLP can only share linear
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	38 features, a deep architecture can share non-linear ones which have the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	39 potential for representing more abstract concepts.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	40
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	41 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	42 labels are used. In the supervised fine-tuning phase, all labels are used,
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	43 so this is not the semi-supervised setting. This paper did not examine the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	44 potential advantage of exploiting large quantities of additional unlabeled
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	45 data, but the availability of the generated dataset and of the learning
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	46 setup would make it possible to easily conduct a study to answer this
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	47 interesting question. Note however that previous work [5] already
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	48 investigated the relative advantage of the semi-supervised setting for deep
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	49 vs shallow architectures, which is why we did not focus on this here. It
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	50 might still be worth to do these experiments because the deep learning
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	51 algorithms were different.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	52
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	53 "...human errors may be present...": Indeed, there are variations across
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	54 human labelings, which have have estimated (since each character was viewed
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	55 by 3 different humans), and reported in the paper (the standard deviations
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	56 across humans are large, but the standard error across a large test set is
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	57 very small, so we believe the average error numbers to be fairly accurate).
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	58
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	59 "...authors do cite a supplement, but I did not have access to it...": that
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	60 is strange. We could (and still can) access it from the CMT web site. We
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	61 will make sure to include a complete pseudo-code of SDAs in it.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	62
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	63 "...main contributions of the manuscript...": the main contribution is
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	64 actually to show that the self-taught learning setting is more beneficial
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	65 to deeper architectures.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	66
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	67 "...restriction to MLPs...": that restriction was motivated by the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	68 computational challenge of training on hundreds of millions of
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	69 examples. Apart from linear models (which do not fare well on this task),
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	70 it is not clear to us what could be used, and so MLPs were the obvious
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	71 candidates to compare with. We will explore the use of SVM approximations,
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	72 as suggested by Reviewer_1. Other suggestions are welcome.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	73
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	74 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	75 prior work on character recognition using deformations and
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	76 transformations". The main originality is in showing that deep learners
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	77 can take more advantage than shallow learners of such data and of the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	78 self-taught learning framework in general.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	79

Mercurial > ift6266

annotate writeup/nips_rebuttal_clean.txt @ 575:bff9ab360ef4