annotate writeup/nips_rebuttal_clean.txt @ 575:bff9ab360ef4

nips_rebuttal_clean
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sat, 07 Aug 2010 22:46:12 -0400
parents d12b9a1432e8
children 185d79636a20
rev   line source
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
1
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
2 Reviewer_1 claims that handwriting recognition is essentially solved: we
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
3 believe this is not true. Yes the best methods have been getting
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
4 essentially human performance in the case of clean digits. But we are not
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
5 aware of previous papers achieving human performance on the full character
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
6 set. It is clear from our own experimentation (play with the demo to
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
7 convince yourself) that humans still clearly outperform machines when the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
8 characters are heavily distorted (e.g. as in our NISTP dataset).
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
9
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
10
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
11 "...not intended to compete with the state-of-the-art...": We had included
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
12 comparisons with the state-of-the-art on the NIST dataset (and beat it).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
13
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
14
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
15 "the demonstrations that self-taught learning can help deep learners is
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
16 helpful": indeed, but it is even more interesting to consider the result
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
17 that self-taught learning was found *more helpful for deep learners than
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
18 for shallow ones*. Since out-of-distribution data is common (especially
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
19 out-of-class data), this is of practical importance.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
20
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
22 SVMs cannot be used on such large datasets. We will explore SVM variants
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
23 such as the suggestion made to add SVM results to the paper.
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
24
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
25
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
26 "...it would be helpful to provide some theoretical analysis...": indeed,
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
27 but this is either mathematically challenging (to say the least, since deep
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
28 models involve a non-convex optimization) or would likely require very
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
29 strong assumptions on the data distribution. However, there exists
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
30 theoretical literature which answers some basic questions about this issue,
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
31 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
32 representations". The argument is about capacity and sharing it across
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
33 tasks so as to achieve better generalization. The lower layers implement
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
34 features that can potentially be shared across tasks. As long as some
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
35 sharing is possible (because the same features can be useful for several
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
36 tasks), then there is a potential benefit from shared internal
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
37 representations. Whereas a one-hidden-layer MLP can only share linear
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
38 features, a deep architecture can share non-linear ones which have the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
39 potential for representing more abstract concepts.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
40
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
41 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
42 labels are used. In the supervised fine-tuning phase, all labels are used,
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
43 so this is not the semi-supervised setting. This paper did not examine the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
44 potential advantage of exploiting large quantities of additional unlabeled
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
45 data, but the availability of the generated dataset and of the learning
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
46 setup would make it possible to easily conduct a study to answer this
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
47 interesting question. Note however that previous work [5] already
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
48 investigated the relative advantage of the semi-supervised setting for deep
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
49 vs shallow architectures, which is why we did not focus on this here. It
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
50 might still be worth to do these experiments because the deep learning
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
51 algorithms were different.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
52
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
53 "...human errors may be present...": Indeed, there are variations across
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
54 human labelings, which have have estimated (since each character was viewed
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
55 by 3 different humans), and reported in the paper (the standard deviations
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
56 across humans are large, but the standard error across a large test set is
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
57 very small, so we believe the average error numbers to be fairly accurate).
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
58
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
59 "...authors do cite a supplement, but I did not have access to it...": that
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
60 is strange. We could (and still can) access it from the CMT web site. We
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
61 will make sure to include a complete pseudo-code of SDAs in it.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
62
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
63 "...main contributions of the manuscript...": the main contribution is
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
64 actually to show that the self-taught learning setting is more beneficial
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
65 to deeper architectures.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
66
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
67 "...restriction to MLPs...": that restriction was motivated by the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
68 computational challenge of training on hundreds of millions of
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
69 examples. Apart from linear models (which do not fare well on this task),
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
70 it is not clear to us what could be used, and so MLPs were the obvious
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
71 candidates to compare with. We will explore the use of SVM approximations,
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
72 as suggested by Reviewer_1. Other suggestions are welcome.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
73
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
74 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
75 prior work on character recognition using deformations and
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
76 transformations". The main originality is in showing that deep learners
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
77 can take more advantage than shallow learners of such data and of the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
78 self-taught learning framework in general.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
79