ift6266: writeup/nips_rebuttal_clean.txt comparison

comparison writeup/nips_rebuttal_clean.txt @ 578:61aae4fd2da5

typo fixed, uploaded to CMT

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Sun, 08 Aug 2010 08:16:21 -0400
parents	685756a11fd2
children	5a777a2550e0

comparison

equal deleted inserted replaced

-:685756a11fd2
+:61aae4fd2da5
 "...it would be helpful to provide some theoretical analysis...": indeed, but this appears mathematically challenging (to say the least, since deep models involve a non-convex optimization) or would likely require very strong distributional assumptions. However, previous theoretical literature already provides some answers, e.g., Jonathan Baxter's (COLT 1995) "Learning internal representations". The argument is about sharing capacity across tasks to improve generalization: lower layers features can potentially be shared across tasks. Whereas a one-hidden-layer MLP can only share linear features, a deep architecture can share non-linear ones which have the potential for representing more abstract concepts.
 Reviewer_5 about semi-supervised learning: In the unsupervised phase, no labels are used. In the supervised fine-tuning phase, all labels are used.  So this is *not* the semi-supervised setting, which was already previously studied [5], showing the advantage of depth. Instead, we focus here on the out-of-distribution aspect of self-taught learning.
-"...human errors may be present...": Indeed, there are variations across human labelings, which have have estimated (since each character was viewed by 3 different humans), and reported in the paper (the standard deviations across humans are large, but the standard error across a large test set is very small, so we believe the average error numbers to be fairly accurate).
+"...human errors may be present...": Indeed, there are variations across human labelings, which have been estimated (since each character was viewed by 3 different humans), and reported in the paper (the standard deviations across humans are large, but the standard error across a large test set is very small, so we believe the average error numbers to be fairly accurate).
 "...supplement, but I did not have access to it...": strange!  We could (and still can) access it. We will include a complete pseudo-code of SDAs in it.
 "...main contributions of the manuscript...": the main contribution is actually to show that the self-taught learning setting is more beneficial to deeper architectures.

Mercurial > ift6266

comparison writeup/nips_rebuttal_clean.txt @ 578:61aae4fd2da5