Mercurial > ift6266
comparison writeup/nips_rebuttal_clean.txt @ 575:bff9ab360ef4
nips_rebuttal_clean
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 07 Aug 2010 22:46:12 -0400 |
parents | d12b9a1432e8 |
children | 185d79636a20 |
comparison
equal
deleted
inserted
replaced
574:d12b9a1432e8 | 575:bff9ab360ef4 |
---|---|
1 Reviewer_1 claims that handwriting recognition is essentially solved, and we | |
2 believe that this is not true. Indeed, the best methods have been | |
3 getting essentially human performance in the case of clean digits. We are not | |
4 aware of previous papers showing that human performance has been reached on the | |
5 full character set. Furthermore, it is clear from our own experimentation that | |
6 humans still greatly outperform machines when the characters are heavily | |
7 distorted (e.g. the NISTP dataset). Playing with the provided demo will | |
8 quickly convince you that this is true. | |
9 | 1 |
10 "...not intended to compete with the state-of-the-art...": We actually included | 2 Reviewer_1 claims that handwriting recognition is essentially solved: we |
3 believe this is not true. Yes the best methods have been getting | |
4 essentially human performance in the case of clean digits. But we are not | |
5 aware of previous papers achieving human performance on the full character | |
6 set. It is clear from our own experimentation (play with the demo to | |
7 convince yourself) that humans still clearly outperform machines when the | |
8 characters are heavily distorted (e.g. as in our NISTP dataset). | |
9 | |
10 | |
11 "...not intended to compete with the state-of-the-art...": We had included | |
11 comparisons with the state-of-the-art on the NIST dataset (and beat it). | 12 comparisons with the state-of-the-art on the NIST dataset (and beat it). |
12 | 13 |
14 | |
13 "the demonstrations that self-taught learning can help deep learners is | 15 "the demonstrations that self-taught learning can help deep learners is |
14 helpful": indeed, but it is even more interesting to consider the result that | 16 helpful": indeed, but it is even more interesting to consider the result |
15 self-taught learning was found *more helpful for deep learners than for shallow | 17 that self-taught learning was found *more helpful for deep learners than |
16 ones*. Since the availability of out-of-distribution data is common (especially | 18 for shallow ones*. Since out-of-distribution data is common (especially |
17 out-of-class data), this is of practical importance. | 19 out-of-class data), this is of practical importance. |
18 | 20 |
19 Reviewer_4: "It would also be interesting to compare to SVMs...": ordinary SVMs cannot be | 21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary |
20 used on such large datasets, and indeed it is a good idea to explore variants of | 22 SVMs cannot be used on such large datasets. We will explore SVM variants |
21 SVMs or approximations of SVMs. We will continue exploring this thread (and the | 23 such as the suggestion made to add SVM results to the paper. |
22 particular suggestion made) and hope to include these results in the final | |
23 paper, to add more shallow learners to the comparison. | |
24 | 24 |
25 "...it would be helpful to provide some theoretical analysis...": indeed, but | 25 |
26 this is either mathematically challenging (to say the least, since deep models | 26 "...it would be helpful to provide some theoretical analysis...": indeed, |
27 involve a non-convex optimization) or would likely require very strong | 27 but this is either mathematically challenging (to say the least, since deep |
28 assumptions on the data distribution. However, there exists | 28 models involve a non-convex optimization) or would likely require very |
29 strong assumptions on the data distribution. However, there exists | |
29 theoretical literature which answers some basic questions about this issue, | 30 theoretical literature which answers some basic questions about this issue, |
30 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal | 31 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal |
31 representations". The argument is about capacity | 32 representations". The argument is about capacity and sharing it across |
32 and sharing it across tasks so as to achieve better generalization. The lower | 33 tasks so as to achieve better generalization. The lower layers implement |
33 layers implement features that can potentially be shared across tasks. As long | 34 features that can potentially be shared across tasks. As long as some |
34 as some sharing is possible (because the same features can be useful for several | 35 sharing is possible (because the same features can be useful for several |
35 tasks), then there is a potential benefit from shared | 36 tasks), then there is a potential benefit from shared internal |
36 internal representations. Whereas a one-hidden-layer MLP can only share linear | 37 representations. Whereas a one-hidden-layer MLP can only share linear |
37 features, a deep architecture can share non-linear ones which have the potential | 38 features, a deep architecture can share non-linear ones which have the |
38 for representing more abstract concepts. | 39 potential for representing more abstract concepts. |
39 | 40 |
40 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no labels | 41 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no |
41 are used. In the supervised fine-tuning phase, all labels are used, so this is | 42 labels are used. In the supervised fine-tuning phase, all labels are used, |
42 not the semi-supervised setting. This paper did not examine the potential | 43 so this is not the semi-supervised setting. This paper did not examine the |
43 advantage of exploiting large quantities of additional unlabeled data, but the | 44 potential advantage of exploiting large quantities of additional unlabeled |
44 availability of the generated dataset and of the learning setup would make it | 45 data, but the availability of the generated dataset and of the learning |
45 possible to easily conduct a study to answer this interesting | 46 setup would make it possible to easily conduct a study to answer this |
46 question. Note however that previous work [5] already investigated the relative | 47 interesting question. Note however that previous work [5] already |
47 advantage of the semi-supervised setting for deep vs shallow architectures, | 48 investigated the relative advantage of the semi-supervised setting for deep |
48 which is why we did not focus on this here. It might still be worth to do these | 49 vs shallow architectures, which is why we did not focus on this here. It |
49 experiments because the deep learning algorithms were different. | 50 might still be worth to do these experiments because the deep learning |
51 algorithms were different. | |
50 | 52 |
51 "...human errors may be present...": Indeed, there are variations across human | 53 "...human errors may be present...": Indeed, there are variations across |
52 labelings, which have have estimated (since each character | 54 human labelings, which have have estimated (since each character was viewed |
53 was viewed by 3 different humans), and reported in the paper (the standard | 55 by 3 different humans), and reported in the paper (the standard deviations |
54 deviations across humans are large, but the standard error across a large test | 56 across humans are large, but the standard error across a large test set is |
55 set is very small, so we believe the average error numbers to be fairly | 57 very small, so we believe the average error numbers to be fairly accurate). |
56 accurate). | |
57 | 58 |
58 "...authors do cite a supplement, but I did not have access to it...": that is | 59 "...authors do cite a supplement, but I did not have access to it...": that |
59 strange. We could (and still can) access it from the CMT web site. We will make | 60 is strange. We could (and still can) access it from the CMT web site. We |
60 sure to include a complete pseudo-code of SDAs in it. | 61 will make sure to include a complete pseudo-code of SDAs in it. |
61 | 62 |
62 "...main contributions of the manuscript...": the main | 63 "...main contributions of the manuscript...": the main contribution is |
63 contribution is actually to show that the self-taught learning setting is more | 64 actually to show that the self-taught learning setting is more beneficial |
64 beneficial to deeper architectures. | 65 to deeper architectures. |
65 | 66 |
66 "...restriction to MLPs...": that restriction was motivated by the computational | 67 "...restriction to MLPs...": that restriction was motivated by the |
67 challenge of training on hundreds of millions of examples. Apart from linear | 68 computational challenge of training on hundreds of millions of |
68 models (which do not fare well on this task), it is not clear to us what | 69 examples. Apart from linear models (which do not fare well on this task), |
69 could be used, and so MLPs were the | 70 it is not clear to us what could be used, and so MLPs were the obvious |
70 obvious candidates to compare with. We will explore the use of SVM | 71 candidates to compare with. We will explore the use of SVM approximations, |
71 approximations, as suggested by Reviewer_1. Other suggestions are welcome. | 72 as suggested by Reviewer_1. Other suggestions are welcome. |
72 | 73 |
73 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of | 74 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of |
74 prior work on character recognition using deformations and transformations". | 75 prior work on character recognition using deformations and |
75 The main originality is in showing that deep learners can take more advantage | 76 transformations". The main originality is in showing that deep learners |
76 than shallow learners of such data and of the self-taught learning framework in | 77 can take more advantage than shallow learners of such data and of the |
77 general. | 78 self-taught learning framework in general. |
78 | 79 |