Mercurial > ift6266
annotate writeup/nips_rebuttal_clean.txt @ 575:bff9ab360ef4
nips_rebuttal_clean
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 07 Aug 2010 22:46:12 -0400 |
parents | d12b9a1432e8 |
children | 185d79636a20 |
rev | line source |
---|---|
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
1 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
2 Reviewer_1 claims that handwriting recognition is essentially solved: we |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
3 believe this is not true. Yes the best methods have been getting |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
4 essentially human performance in the case of clean digits. But we are not |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
5 aware of previous papers achieving human performance on the full character |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
6 set. It is clear from our own experimentation (play with the demo to |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
7 convince yourself) that humans still clearly outperform machines when the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
8 characters are heavily distorted (e.g. as in our NISTP dataset). |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
9 |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
10 |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
11 "...not intended to compete with the state-of-the-art...": We had included |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
12 comparisons with the state-of-the-art on the NIST dataset (and beat it). |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
13 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
14 |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
15 "the demonstrations that self-taught learning can help deep learners is |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
16 helpful": indeed, but it is even more interesting to consider the result |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
17 that self-taught learning was found *more helpful for deep learners than |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
18 for shallow ones*. Since out-of-distribution data is common (especially |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
19 out-of-class data), this is of practical importance. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
20 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
22 SVMs cannot be used on such large datasets. We will explore SVM variants |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
23 such as the suggestion made to add SVM results to the paper. |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
24 |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
25 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
26 "...it would be helpful to provide some theoretical analysis...": indeed, |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
27 but this is either mathematically challenging (to say the least, since deep |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
28 models involve a non-convex optimization) or would likely require very |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
29 strong assumptions on the data distribution. However, there exists |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
30 theoretical literature which answers some basic questions about this issue, |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
31 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
32 representations". The argument is about capacity and sharing it across |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
33 tasks so as to achieve better generalization. The lower layers implement |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
34 features that can potentially be shared across tasks. As long as some |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
35 sharing is possible (because the same features can be useful for several |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
36 tasks), then there is a potential benefit from shared internal |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
37 representations. Whereas a one-hidden-layer MLP can only share linear |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
38 features, a deep architecture can share non-linear ones which have the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
39 potential for representing more abstract concepts. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
40 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
41 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
42 labels are used. In the supervised fine-tuning phase, all labels are used, |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
43 so this is not the semi-supervised setting. This paper did not examine the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
44 potential advantage of exploiting large quantities of additional unlabeled |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
45 data, but the availability of the generated dataset and of the learning |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
46 setup would make it possible to easily conduct a study to answer this |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
47 interesting question. Note however that previous work [5] already |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
48 investigated the relative advantage of the semi-supervised setting for deep |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
49 vs shallow architectures, which is why we did not focus on this here. It |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
50 might still be worth to do these experiments because the deep learning |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
51 algorithms were different. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
52 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
53 "...human errors may be present...": Indeed, there are variations across |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
54 human labelings, which have have estimated (since each character was viewed |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
55 by 3 different humans), and reported in the paper (the standard deviations |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
56 across humans are large, but the standard error across a large test set is |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
57 very small, so we believe the average error numbers to be fairly accurate). |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
58 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
59 "...authors do cite a supplement, but I did not have access to it...": that |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
60 is strange. We could (and still can) access it from the CMT web site. We |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
61 will make sure to include a complete pseudo-code of SDAs in it. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
62 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
63 "...main contributions of the manuscript...": the main contribution is |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
64 actually to show that the self-taught learning setting is more beneficial |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
65 to deeper architectures. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
66 |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
67 "...restriction to MLPs...": that restriction was motivated by the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
68 computational challenge of training on hundreds of millions of |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
69 examples. Apart from linear models (which do not fare well on this task), |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
70 it is not clear to us what could be used, and so MLPs were the obvious |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
71 candidates to compare with. We will explore the use of SVM approximations, |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
72 as suggested by Reviewer_1. Other suggestions are welcome. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
73 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
74 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of |
575
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
75 prior work on character recognition using deformations and |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
76 transformations". The main originality is in showing that deep learners |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
77 can take more advantage than shallow learners of such data and of the |
bff9ab360ef4
nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents:
574
diff
changeset
|
78 self-taught learning framework in general. |
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
79 |