# HG changeset patch
# User Yoshua Bengio <bengioy@iro.umontreal.ca>
# Date 1294601703 18000
# Node ID 5c67f674d724e4f2121fdf6de85a9de76224ec06
# Parent  ea31fee25147ab2e0394cf5c00e478b009c214d2
more changes to rebuttal

diff -r ea31fee25147 -r 5c67f674d724 writeup/aistats_review_response.txt
--- a/writeup/aistats_review_response.txt	Sun Jan 09 14:15:04 2011 -0500
+++ b/writeup/aistats_review_response.txt	Sun Jan 09 14:35:03 2011 -0500
@@ -3,7 +3,7 @@
 
 * Comparisons with shallower networks, but using unsupervised pre-training:
 e will add those results to the paper. Previous work in our group with
-very similar data (the InfiniteMNIST dataset were published in JMLR in 20102
+very similar data (the InfiniteMNIST dataset were published in JMLR in 2010
 "Why Does Unsupervised Pre-training Help Deep Learning?"). The results indeed
 show improvement when going from 1 to 2 and then 3 layers, even when using
 unsupervised pre-training (RBM or Denoising Auto-Encoder).
@@ -65,6 +65,21 @@
 to be fair, if we were to do so, we should also consider the same
 multi-stage decision process for the machine learning algorithms as well.
 
+* Size of labeled set: in our JMLR 2010 paper on deep learning (cited
+above), we already verified the effect of number of labeled examples on the
+deep learners and shallow learners (with or without unsupervised
+pre-training); see fig. 11 of that paper, which involves data very similar
+to those studied here. Basically (and somewhat surprisingly) the deep
+learners with unsupervised pre-training can take more advantage of a large
+amount of labeled examples, presumably because of the initialization effect
+(that benefits from the prior that representations that are useful for P(X)
+are also useful for P(Y|X)), and the effect does not disappear when the
+number of labeled examples increases. Other work in the semi-supervised
+setting (Lee et al, NIPS2009, "Unsupervised feature learning...") also show
+that the advantage of unsupervised feature learning by a deep architecture
+is most pronounced in the semi-supervised setting with very few labeled
+examples. Adding the training curve in the self-taught settings of this AISTAT
+submission is a good idea, but probably unlikely to provide results
+different from the above already reported in the literature in similar
+settings.
 
-* Size of labeled set:
-