Mercurial > ift6266
annotate writeup/nips_rebuttal_clean.txt @ 574:d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Sat, 07 Aug 2010 18:39:36 -0700 |
parents | |
children | bff9ab360ef4 |
rev | line source |
---|---|
574
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
1 Reviewer_1 claims that handwriting recognition is essentially solved, and we |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
2 believe that this is not true. Indeed, the best methods have been |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
3 getting essentially human performance in the case of clean digits. We are not |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
4 aware of previous papers showing that human performance has been reached on the |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
5 full character set. Furthermore, it is clear from our own experimentation that |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
6 humans still greatly outperform machines when the characters are heavily |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
7 distorted (e.g. the NISTP dataset). Playing with the provided demo will |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
8 quickly convince you that this is true. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
9 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
10 "...not intended to compete with the state-of-the-art...": We actually included |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
11 comparisons with the state-of-the-art on the NIST dataset (and beat it). |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
12 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
13 "the demonstrations that self-taught learning can help deep learners is |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
14 helpful": indeed, but it is even more interesting to consider the result that |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
15 self-taught learning was found *more helpful for deep learners than for shallow |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
16 ones*. Since the availability of out-of-distribution data is common (especially |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
17 out-of-class data), this is of practical importance. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
18 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
19 Reviewer_4: "It would also be interesting to compare to SVMs...": ordinary SVMs cannot be |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
20 used on such large datasets, and indeed it is a good idea to explore variants of |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
21 SVMs or approximations of SVMs. We will continue exploring this thread (and the |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
22 particular suggestion made) and hope to include these results in the final |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
23 paper, to add more shallow learners to the comparison. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
24 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
25 "...it would be helpful to provide some theoretical analysis...": indeed, but |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
26 this is either mathematically challenging (to say the least, since deep models |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
27 involve a non-convex optimization) or would likely require very strong |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
28 assumptions on the data distribution. However, there exists |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
29 theoretical literature which answers some basic questions about this issue, |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
30 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
31 representations". The argument is about capacity |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
32 and sharing it across tasks so as to achieve better generalization. The lower |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
33 layers implement features that can potentially be shared across tasks. As long |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
34 as some sharing is possible (because the same features can be useful for several |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
35 tasks), then there is a potential benefit from shared |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
36 internal representations. Whereas a one-hidden-layer MLP can only share linear |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
37 features, a deep architecture can share non-linear ones which have the potential |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
38 for representing more abstract concepts. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
39 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
40 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no labels |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
41 are used. In the supervised fine-tuning phase, all labels are used, so this is |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
42 not the semi-supervised setting. This paper did not examine the potential |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
43 advantage of exploiting large quantities of additional unlabeled data, but the |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
44 availability of the generated dataset and of the learning setup would make it |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
45 possible to easily conduct a study to answer this interesting |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
46 question. Note however that previous work [5] already investigated the relative |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
47 advantage of the semi-supervised setting for deep vs shallow architectures, |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
48 which is why we did not focus on this here. It might still be worth to do these |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
49 experiments because the deep learning algorithms were different. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
50 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
51 "...human errors may be present...": Indeed, there are variations across human |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
52 labelings, which have have estimated (since each character |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
53 was viewed by 3 different humans), and reported in the paper (the standard |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
54 deviations across humans are large, but the standard error across a large test |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
55 set is very small, so we believe the average error numbers to be fairly |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
56 accurate). |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
57 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
58 "...authors do cite a supplement, but I did not have access to it...": that is |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
59 strange. We could (and still can) access it from the CMT web site. We will make |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
60 sure to include a complete pseudo-code of SDAs in it. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
61 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
62 "...main contributions of the manuscript...": the main |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
63 contribution is actually to show that the self-taught learning setting is more |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
64 beneficial to deeper architectures. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
65 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
66 "...restriction to MLPs...": that restriction was motivated by the computational |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
67 challenge of training on hundreds of millions of examples. Apart from linear |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
68 models (which do not fare well on this task), it is not clear to us what |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
69 could be used, and so MLPs were the |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
70 obvious candidates to compare with. We will explore the use of SVM |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
71 approximations, as suggested by Reviewer_1. Other suggestions are welcome. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
72 |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
73 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
74 prior work on character recognition using deformations and transformations". |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
75 The main originality is in showing that deep learners can take more advantage |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
76 than shallow learners of such data and of the self-taught learning framework in |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
77 general. |
d12b9a1432e8
cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff
changeset
|
78 |