Mercurial > ift6266
annotate writeup/nips_rebuttal_clean.txt @ 576:185d79636a20
now fits
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 07 Aug 2010 22:54:54 -0400 |
parents | bff9ab360ef4 |
children | 685756a11fd2 |
rev | line source |
---|---|
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
1 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
2 Reviewer_1 claims that handwriting recognition is essentially solved: we |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
3 believe this is not true. Yes the best methods have been getting |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
4 essentially human performance in the case of clean digits. But we are not |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
5 aware of previous papers achieving human performance on the full character |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
6 set. It is clear from our own experimentation (play with the demo to |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
7 convince yourself) that humans still clearly outperform machines when the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
8 characters are heavily distorted (e.g. as in our NISTP dataset). |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
9 |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
10 |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
11 "...not intended to compete with the state-of-the-art...": We had included |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
12 comparisons with the state-of-the-art on the NIST dataset (and beat it). |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
13 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
14 |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
15 "the demonstrations that self-taught learning can help deep learners is |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
16 helpful": indeed, but it is even more interesting to consider the result |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
17 that self-taught learning was found *more helpful for deep learners than |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
18 for shallow ones*. Since out-of-distribution data is common (especially |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
19 out-of-class data), this is of practical importance. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
20 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
22 SVMs cannot be used on such large datasets. We will explore SVM variants |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
23 such as the suggestion made to add SVM results to the paper. |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
24 |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
25 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
26 "...it would be helpful to provide some theoretical analysis...": indeed, |
576 | 27 but this appears mathematically challenging (to say the least, since deep |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
28 models involve a non-convex optimization) or would likely require very |
576 | 29 strong distributional assumptions. However, previous |
30 theoretical literature already provides some answers, e.g., | |
31 Jonathan Baxter's (COLT 1995) "Learning internal | |
32 representations". The argument is about sharing capacity across | |
33 tasks to improve generalization: lower layers features can potentially | |
34 be shared across tasks. Whereas a one-hidden-layer MLP can only share linear | |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
35 features, a deep architecture can share non-linear ones which have the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
36 potential for representing more abstract concepts. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
37 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
38 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no |
576 | 39 labels are used. In the supervised fine-tuning phase, all labels are used. |
40 So this is *not* the semi-supervised setting, which was already previously | |
41 studied [5], showing the advantage of depth. Instead, we focus here | |
42 on the out-of-distribution aspect of self-taught learning. | |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
43 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
44 "...human errors may be present...": Indeed, there are variations across |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
45 human labelings, which have have estimated (since each character was viewed |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
46 by 3 different humans), and reported in the paper (the standard deviations |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
47 across humans are large, but the standard error across a large test set is |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
48 very small, so we believe the average error numbers to be fairly accurate). |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
49 |
576 | 50 "...supplement, but I did not have access to it...": strange! We could |
51 (and still can) access it. We will include a complete pseudo-code of SDAs | |
52 in it. | |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
53 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
54 "...main contributions of the manuscript...": the main contribution is |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
55 actually to show that the self-taught learning setting is more beneficial |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
56 to deeper architectures. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
57 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
58 "...restriction to MLPs...": that restriction was motivated by the |
576 | 59 computational challenge of training on nearly a billion examples. Linear |
60 models do not fare well here, and most non-parametric models do not scale | |
61 well, so MLPs (which have been used before on this task) were natural as | |
62 the baseline. We will explore the use of SVM approximations, as suggested | |
63 by Reviewer_1. | |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
64 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
65 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
66 prior work on character recognition using deformations and |
576 | 67 transformations". Main originality = showing that deep learners |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
68 can take more advantage than shallow learners of such data and of the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
69 self-taught learning framework in general. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
70 |