changeset 504:e837ef6eef8c

commit early, commit often: a couple of changes to kick-start things
author dumitru@dumitru.mtv.corp.google.com
date Tue, 01 Jun 2010 10:53:07 -0700
parents 5927432d8b8d
children a41a8925be70 c421ea80edeb
files writeup/nips2010_submission.tex
diffstat 1 files changed, 8 insertions(+), 13 deletions(-) [+]
line wrap: on
line diff
--- a/writeup/nips2010_submission.tex	Tue Jun 01 12:28:05 2010 -0400
+++ b/writeup/nips2010_submission.tex	Tue Jun 01 10:53:07 2010 -0700
@@ -20,7 +20,7 @@
   Recent theoretical and empirical work in statistical machine learning has
   demonstrated the importance of learning algorithms for deep
   architectures, i.e., function classes obtained by composing multiple
-  non-linear transformations. The self-taught learning (exploiting unlabeled
+  non-linear transformations. Self-taught learning (exploiting unlabeled
   examples or examples from other distributions) has already been applied
   to deep learners, but mostly to show the advantage of unlabeled
   examples. Here we explore the advantage brought by {\em out-of-distribution
@@ -74,8 +74,8 @@
 performed similarly or better than previously proposed Restricted Boltzmann
 Machines in terms of unsupervised extraction of a hierarchy of features
 useful for classification.  The principle is that each layer starting from
-the bottom is trained to encode their input (the output of the previous
-layer) and try to reconstruct it from a corrupted version of it. After this
+the bottom is trained to encode its input (the output of the previous
+layer) and to reconstruct it from a corrupted version of it. After this
 unsupervised initialization, the stack of denoising auto-encoders can be
 converted into a deep supervised feedforward neural network and fine-tuned by
 stochastic gradient descent.
@@ -91,6 +91,8 @@
 (but see~\citep{CollobertR2008}). In particular the {\em relative
 advantage} of deep learning for this settings has not been evaluated.
 
+% TODO: Explain why we care about this question.
+
 In this paper we ask the following questions:
 
 %\begin{enumerate}
@@ -115,7 +117,7 @@
 a corresponding shallow and purely supervised architecture?
 %\end{enumerate}
 
-The experimental results presented here provide positive evidence towards all of these questions.
+Our experimental results provide evidence to support positive answers to all of these questions.
 
 \vspace*{-1mm}
 \section{Perturbation and Transformation of Character Images}
@@ -525,8 +527,7 @@
 
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}\\
-\caption{Left: overall results; error bars indicate a 95\% confidence interval. 
-Right: error rates on NIST test digits only, with results from literature. }
+\caption{Charts corresponding to table 1 of Appendix I. Left: overall results; error bars indicate a 95\% confidence interval. Right: error rates on NIST test digits only, with results from literature. }
 \label{fig:error-rates-charts}
 \end{figure}
 
@@ -566,13 +567,7 @@
 
 \begin{figure}[h]
 \resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}\\
-\caption{Relative improvement in error rate due to self-taught learning. 
-Left: Improvement (or loss, when negative)
-induced by out-of-distribution examples (perturbed data). 
-Right: Improvement (or loss, when negative) induced by multi-task 
-learning (training on all classes and testing only on either digits,
-upper case, or lower-case). The deep learner (SDA) benefits more from
-both self-taught learning scenarios, compared to the shallow MLP.}
+\caption{Charts corresponding to tables 2 (left) and 3 (right), from Appendix I.}
 \label{fig:improvements-charts}
 \end{figure}