ift6266: writeup/nips_rebuttal

annotate writeup/nips_rebuttal_clean.txt @ 574:d12b9a1432e8

cleaned-up version, fewer typos, shortened (but need 700 chars less)

author	Dumitru Erhan <dumitru.erhan@gmail.com>
date	Sat, 07 Aug 2010 18:39:36 -0700
parents
children	bff9ab360ef4

rev	line source
574 d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	1 Reviewer_1 claims that handwriting recognition is essentially solved, and we
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	2 believe that this is not true. Indeed, the best methods have been
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	3 getting essentially human performance in the case of clean digits. We are not
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	4 aware of previous papers showing that human performance has been reached on the
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	5 full character set. Furthermore, it is clear from our own experimentation that
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	6 humans still greatly outperform machines when the characters are heavily
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	7 distorted (e.g. the NISTP dataset). Playing with the provided demo will
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	8 quickly convince you that this is true.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	9
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	10 "...not intended to compete with the state-of-the-art...": We actually included
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	11 comparisons with the state-of-the-art on the NIST dataset (and beat it).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	12
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	13 "the demonstrations that self-taught learning can help deep learners is
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	14 helpful": indeed, but it is even more interesting to consider the result that
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	15 self-taught learning was found *more helpful for deep learners than for shallow
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	16 ones*. Since the availability of out-of-distribution data is common (especially
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	17 out-of-class data), this is of practical importance.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	18
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	19 Reviewer_4: "It would also be interesting to compare to SVMs...": ordinary SVMs cannot be
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	20 used on such large datasets, and indeed it is a good idea to explore variants of
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	21 SVMs or approximations of SVMs. We will continue exploring this thread (and the
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	22 particular suggestion made) and hope to include these results in the final
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	23 paper, to add more shallow learners to the comparison.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	24
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	25 "...it would be helpful to provide some theoretical analysis...": indeed, but
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	26 this is either mathematically challenging (to say the least, since deep models
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	27 involve a non-convex optimization) or would likely require very strong
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	28 assumptions on the data distribution. However, there exists
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	29 theoretical literature which answers some basic questions about this issue,
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	30 starting with the work of Jonathan Baxter (COLT 1995) "Learning internal
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	31 representations". The argument is about capacity
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	32 and sharing it across tasks so as to achieve better generalization. The lower
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	33 layers implement features that can potentially be shared across tasks. As long
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	34 as some sharing is possible (because the same features can be useful for several
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	35 tasks), then there is a potential benefit from shared
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	36 internal representations. Whereas a one-hidden-layer MLP can only share linear
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	37 features, a deep architecture can share non-linear ones which have the potential
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	38 for representing more abstract concepts.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	39
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	40 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no labels
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	41 are used. In the supervised fine-tuning phase, all labels are used, so this is
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	42 not the semi-supervised setting. This paper did not examine the potential
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	43 advantage of exploiting large quantities of additional unlabeled data, but the
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	44 availability of the generated dataset and of the learning setup would make it
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	45 possible to easily conduct a study to answer this interesting
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	46 question. Note however that previous work [5] already investigated the relative
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	47 advantage of the semi-supervised setting for deep vs shallow architectures,
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	48 which is why we did not focus on this here. It might still be worth to do these
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	49 experiments because the deep learning algorithms were different.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	50
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	51 "...human errors may be present...": Indeed, there are variations across human
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	52 labelings, which have have estimated (since each character
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	53 was viewed by 3 different humans), and reported in the paper (the standard
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	54 deviations across humans are large, but the standard error across a large test
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	55 set is very small, so we believe the average error numbers to be fairly
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	56 accurate).
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	57
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	58 "...authors do cite a supplement, but I did not have access to it...": that is
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	59 strange. We could (and still can) access it from the CMT web site. We will make
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	60 sure to include a complete pseudo-code of SDAs in it.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	61
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	62 "...main contributions of the manuscript...": the main
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	63 contribution is actually to show that the self-taught learning setting is more
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	64 beneficial to deeper architectures.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	65
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	66 "...restriction to MLPs...": that restriction was motivated by the computational
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	67 challenge of training on hundreds of millions of examples. Apart from linear
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	68 models (which do not fare well on this task), it is not clear to us what
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	69 could be used, and so MLPs were the
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	70 obvious candidates to compare with. We will explore the use of SVM
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	71 approximations, as suggested by Reviewer_1. Other suggestions are welcome.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	72
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	73 "Reviewer 6:...novelty [..] is somewhat marginal since [...] reminiscent of
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	74 prior work on character recognition using deformations and transformations".
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	75 The main originality is in showing that deep learners can take more advantage
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	76 than shallow learners of such data and of the self-taught learning framework in
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	77 general.
d12b9a1432e8 cleaned-up version, fewer typos, shortened (but need 700 chars less) Dumitru Erhan <dumitru.erhan@gmail.com> parents: diff changeset	78

Mercurial > ift6266

annotate writeup/nips_rebuttal_clean.txt @ 574:d12b9a1432e8