annotate writeup/nips_rebuttal_clean.txt @ 576:185d79636a20

now fits
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sat, 07 Aug 2010 22:54:54 -0400
parents bff9ab360ef4
children 685756a11fd2
rev   line source
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
1
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
2 Reviewer_1 claims that handwriting recognition is essentially solved: we
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
3 believe this is not true. Yes the best methods have been getting
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
4 essentially human performance in the case of clean digits. But we are not
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
5 aware of previous papers achieving human performance on the full character
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
6 set. It is clear from our own experimentation (play with the demo to
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
7 convince yourself) that humans still clearly outperform machines when the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
8 characters are heavily distorted (e.g. as in our NISTP dataset).
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
9
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
10
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
11 "...not intended to compete with the state-of-the-art...": We had included
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
12 comparisons with the state-of-the-art on the NIST dataset (and beat it).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
13
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
14
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
15 "the demonstrations that self-taught learning can help deep learners is
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
16 helpful": indeed, but it is even more interesting to consider the result
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
17 that self-taught learning was found *more helpful for deep learners than
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
18 for shallow ones*. Since out-of-distribution data is common (especially
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
19 out-of-class data), this is of practical importance.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
20
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
21 Reviewer_4, "It would also be interesting to compare to SVMs...": ordinary
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
22 SVMs cannot be used on such large datasets. We will explore SVM variants
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
23 such as the suggestion made to add SVM results to the paper.
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
24
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
25
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
26 "...it would be helpful to provide some theoretical analysis...": indeed,
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
27 but this appears mathematically challenging (to say the least, since deep
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
28 models involve a non-convex optimization) or would likely require very
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
29 strong distributional assumptions. However, previous
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
30 theoretical literature already provides some answers, e.g.,
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
31 Jonathan Baxter's (COLT 1995) "Learning internal
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
32 representations". The argument is about sharing capacity across
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
33 tasks to improve generalization: lower layers features can potentially
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
34 be shared across tasks. Whereas a one-hidden-layer MLP can only share linear
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
35 features, a deep architecture can share non-linear ones which have the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
36 potential for representing more abstract concepts.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
37
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
38 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
39 labels are used. In the supervised fine-tuning phase, all labels are used.
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
40 So this is *not* the semi-supervised setting, which was already previously
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
41 studied [5], showing the advantage of depth. Instead, we focus here
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
42 on the out-of-distribution aspect of self-taught learning.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
43
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
44 "...human errors may be present...": Indeed, there are variations across
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
45 human labelings, which have have estimated (since each character was viewed
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
46 by 3 different humans), and reported in the paper (the standard deviations
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
47 across humans are large, but the standard error across a large test set is
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
48 very small, so we believe the average error numbers to be fairly accurate).
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
49
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
50 "...supplement, but I did not have access to it...": strange! We could
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
51 (and still can) access it. We will include a complete pseudo-code of SDAs
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
52 in it.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
53
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
54 "...main contributions of the manuscript...": the main contribution is
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
55 actually to show that the self-taught learning setting is more beneficial
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
56 to deeper architectures.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
57
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
58 "...restriction to MLPs...": that restriction was motivated by the
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
59 computational challenge of training on nearly a billion examples. Linear
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
60 models do not fare well here, and most non-parametric models do not scale
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
61 well, so MLPs (which have been used before on this task) were natural as
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
62 the baseline. We will explore the use of SVM approximations, as suggested
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
63 by Reviewer_1.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
64
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
65 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
66 prior work on character recognition using deformations and
576
185d79636a20 now fits
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 575
diff changeset
67 transformations". Main originality = showing that deep learners
575
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
68 can take more advantage than shallow learners of such data and of the
bff9ab360ef4 nips_rebuttal_clean
Yoshua Bengio <bengioy@iro.umontreal.ca>
parents: 574
diff changeset
69 self-taught learning framework in general.
574
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less)
Dumitru Erhan <dumitru.erhan@gmail.com>
parents:
diff changeset
70