comparison writeup/nips_rebuttal_clean.txt @ 577:685756a11fd2

removed linebreaks
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sat, 07 Aug 2010 22:56:46 -0400
parents 185d79636a20
children 61aae4fd2da5
comparison
equal deleted inserted replaced
576:185d79636a20 577:685756a11fd2
1 1
2 Reviewer_1 claims that handwriting recognition is essentially solved: we 2 Reviewer_1 claims that handwriting recognition is essentially solved: we believe this is not true. Yes the best methods have been getting essentially human performance in the case of clean digits. But we are not aware of previous papers achieving human performance on the full character set. It is clear from our own experimentation (play with the demo to convince yourself) that humans still clearly outperform machines when the characters are heavily distorted (e.g. as in our NISTP dataset).
3 believe this is not true. Yes the best methods have been getting
4 essentially human performance in the case of clean digits. But we are not
5 aware of previous papers achieving human performance on the full character
6 set. It is clear from our own experimentation (play with the demo to
7 convince yourself) that humans still clearly outperform machines when the
8 characters are heavily distorted (e.g. as in our NISTP dataset).
9 3
10 4
11 "...not intended to compete with the state-of-the-art...": We had included 5 "...not intended to compete with the state-of-the-art...": We had included comparisons with the state-of-the-art on the NIST dataset (and beat it).
12 comparisons with the state-of-the-art on the NIST dataset (and beat it).
13 6
14 7
15 "the demonstrations that self-taught learning can help deep learners is 8 "the demonstrations that self-taught learning can help deep learners is helpful": indeed, but it is even more interesting to consider the result that self-taught learning was found *more helpful for deep learners than for shallow ones*. Since out-of-distribution data is common (especially out-of-class data), this is of practical importance.
16 helpful": indeed, but it is even more interesting to consider the result
17 that self-taught learning was found *more helpful for deep learners than
18 for shallow ones*. Since out-of-distribution data is common (especially
19 out-of-class data), this is of practical importance.
20 9
21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary 10 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary SVMs cannot be used on such large datasets. We will explore SVM variants such as the suggestion made to add SVM results to the paper.
22 SVMs cannot be used on such large datasets. We will explore SVM variants
23 such as the suggestion made to add SVM results to the paper.
24 11
25 12
26 "...it would be helpful to provide some theoretical analysis...": indeed, 13 "...it would be helpful to provide some theoretical analysis...": indeed, but this appears mathematically challenging (to say the least, since deep models involve a non-convex optimization) or would likely require very strong distributional assumptions. However, previous theoretical literature already provides some answers, e.g., Jonathan Baxter's (COLT 1995) "Learning internal representations". The argument is about sharing capacity across tasks to improve generalization: lower layers features can potentially be shared across tasks. Whereas a one-hidden-layer MLP can only share linear features, a deep architecture can share non-linear ones which have the potential for representing more abstract concepts.
27 but this appears mathematically challenging (to say the least, since deep
28 models involve a non-convex optimization) or would likely require very
29 strong distributional assumptions. However, previous
30 theoretical literature already provides some answers, e.g.,
31 Jonathan Baxter's (COLT 1995) "Learning internal
32 representations". The argument is about sharing capacity across
33 tasks to improve generalization: lower layers features can potentially
34 be shared across tasks. Whereas a one-hidden-layer MLP can only share linear
35 features, a deep architecture can share non-linear ones which have the
36 potential for representing more abstract concepts.
37 14
38 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no 15 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no labels are used. In the supervised fine-tuning phase, all labels are used. So this is *not* the semi-supervised setting, which was already previously studied [5], showing the advantage of depth. Instead, we focus here on the out-of-distribution aspect of self-taught learning.
39 labels are used. In the supervised fine-tuning phase, all labels are used.
40 So this is *not* the semi-supervised setting, which was already previously
41 studied [5], showing the advantage of depth. Instead, we focus here
42 on the out-of-distribution aspect of self-taught learning.
43 16
44 "...human errors may be present...": Indeed, there are variations across 17 "...human errors may be present...": Indeed, there are variations across human labelings, which have have estimated (since each character was viewed by 3 different humans), and reported in the paper (the standard deviations across humans are large, but the standard error across a large test set is very small, so we believe the average error numbers to be fairly accurate).
45 human labelings, which have have estimated (since each character was viewed
46 by 3 different humans), and reported in the paper (the standard deviations
47 across humans are large, but the standard error across a large test set is
48 very small, so we believe the average error numbers to be fairly accurate).
49 18
50 "...supplement, but I did not have access to it...": strange! We could 19 "...supplement, but I did not have access to it...": strange! We could (and still can) access it. We will include a complete pseudo-code of SDAs in it.
51 (and still can) access it. We will include a complete pseudo-code of SDAs
52 in it.
53 20
54 "...main contributions of the manuscript...": the main contribution is 21 "...main contributions of the manuscript...": the main contribution is actually to show that the self-taught learning setting is more beneficial to deeper architectures.
55 actually to show that the self-taught learning setting is more beneficial
56 to deeper architectures.
57 22
58 "...restriction to MLPs...": that restriction was motivated by the 23 "...restriction to MLPs...": that restriction was motivated by the computational challenge of training on nearly a billion examples. Linear models do not fare well here, and most non-parametric models do not scale well, so MLPs (which have been used before on this task) were natural as the baseline. We will explore the use of SVM approximations, as suggested by Reviewer_1.
59 computational challenge of training on nearly a billion examples. Linear
60 models do not fare well here, and most non-parametric models do not scale
61 well, so MLPs (which have been used before on this task) were natural as
62 the baseline. We will explore the use of SVM approximations, as suggested
63 by Reviewer_1.
64 24
65 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of 25 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of prior work on character recognition using deformations and transformations". Main originality = showing that deep learners can take more advantage than shallow learners of such data and of the self-taught learning framework in general.
66 prior work on character recognition using deformations and
67 transformations". Main originality = showing that deep learners
68 can take more advantage than shallow learners of such data and of the
69 self-taught learning framework in general.
70 26