ift6266: writeup/nips_rebuttal

annotate writeup/nips_rebuttal_clean.txt @ 576:185d79636a20

now fits

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sat, 07 Aug 2010 22:54:54 -0400
parents	bff9ab360ef4
children	685756a11fd2

rev	line source
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	1
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	2 Reviewer_1 claims that handwriting recognition is essentially solved: we
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	3 believe this is not true. Yes the best methods have been getting
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	4 essentially human performance in the case of clean digits. But we are not
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	5 aware of previous papers achieving human performance on the full character
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	6 set. It is clear from our own experimentation (play with the demo to
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	7 convince yourself) that humans still clearly outperform machines when the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	8 characters are heavily distorted (e.g. as in our NISTP dataset).
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	9
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	10
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	11 "...not intended to compete with the state-of-the-art...": We had included
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	12 comparisons with the state-of-the-art on the NIST dataset (and beat it).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	13
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	14
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	15 "the demonstrations that self-taught learning can help deep learners is
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	16 helpful": indeed, but it is even more interesting to consider the result
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	17 that self-taught learning was found *more helpful for deep learners than
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	18 for shallow ones*. Since out-of-distribution data is common (especially
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	19 out-of-class data), this is of practical importance.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	20
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	22 SVMs cannot be used on such large datasets. We will explore SVM variants
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	23 such as the suggestion made to add SVM results to the paper.
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	24
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	25
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	26 "...it would be helpful to provide some theoretical analysis...": indeed,
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	27 but this appears mathematically challenging (to say the least, since deep
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	28 models involve a non-convex optimization) or would likely require very
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	29 strong distributional assumptions. However, previous
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	30 theoretical literature already provides some answers, e.g.,
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	31 Jonathan Baxter's (COLT 1995) "Learning internal
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	32 representations". The argument is about sharing capacity across
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	33 tasks to improve generalization: lower layers features can potentially
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	34 be shared across tasks. Whereas a one-hidden-layer MLP can only share linear
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	35 features, a deep architecture can share non-linear ones which have the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	36 potential for representing more abstract concepts.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	37
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	38 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	39 labels are used. In the supervised fine-tuning phase, all labels are used.
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	40 So this is not the semi-supervised setting, which was already previously
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	41 studied [5], showing the advantage of depth. Instead, we focus here
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	42 on the out-of-distribution aspect of self-taught learning.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	43
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	44 "...human errors may be present...": Indeed, there are variations across
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	45 human labelings, which have have estimated (since each character was viewed
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	46 by 3 different humans), and reported in the paper (the standard deviations
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	47 across humans are large, but the standard error across a large test set is
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	48 very small, so we believe the average error numbers to be fairly accurate).
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	49
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	50 "...supplement, but I did not have access to it...": strange! We could
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	51 (and still can) access it. We will include a complete pseudo-code of SDAs
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	52 in it.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	53
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	54 "...main contributions of the manuscript...": the main contribution is
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	55 actually to show that the self-taught learning setting is more beneficial
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	56 to deeper architectures.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	57
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	58 "...restriction to MLPs...": that restriction was motivated by the
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	59 computational challenge of training on nearly a billion examples. Linear
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	60 models do not fare well here, and most non-parametric models do not scale
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	61 well, so MLPs (which have been used before on this task) were natural as
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	62 the baseline. We will explore the use of SVM approximations, as suggested
185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	63 by Reviewer_1.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	64
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	65 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	66 prior work on character recognition using deformations and
576 185d79636a20 now fits Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 575 diff changeset	67 transformations". Main originality = showing that deep learners
575 bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	68 can take more advantage than shallow learners of such data and of the
bff9ab360ef4 nips_rebuttal_clean Yoshua Bengio <bengioy@iro.umontreal.ca> parents: 574 diff changeset	69 self-taught learning framework in general.
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	70

Mercurial > ift6266

annotate writeup/nips_rebuttal_clean.txt @ 576:185d79636a20