# HG changeset patch
# User Yoshua Bengio <bengioy@iro.umontreal.ca>
# Date 1275415521 14400
# Node ID 920a38715c90eb3e654258a046e2f6ec26e591fa
# Parent  66a905508e34fc01f1e249d33ea5a5c39e1f8230# Parent  d057941417ed1518741380d202ea55ba41bed1d3
merge

diff -r 66a905508e34 -r 920a38715c90 writeup/nips2010_submission.tex
--- a/writeup/nips2010_submission.tex	Tue Jun 01 14:05:02 2010 -0400
+++ b/writeup/nips2010_submission.tex	Tue Jun 01 14:05:21 2010 -0400
@@ -20,7 +20,7 @@
   Recent theoretical and empirical work in statistical machine learning has
   demonstrated the importance of learning algorithms for deep
   architectures, i.e., function classes obtained by composing multiple
-  non-linear transformations. The self-taught learning (exploiting unlabeled
+  non-linear transformations. Self-taught learning (exploiting unlabeled
   examples or examples from other distributions) has already been applied
   to deep learners, but mostly to show the advantage of unlabeled
   examples. Here we explore the advantage brought by {\em out-of-distribution
@@ -74,8 +74,8 @@
 performed similarly or better than previously proposed Restricted Boltzmann
 Machines in terms of unsupervised extraction of a hierarchy of features
 useful for classification.  The principle is that each layer starting from
-the bottom is trained to encode their input (the output of the previous
-layer) and try to reconstruct it from a corrupted version of it. After this
+the bottom is trained to encode its input (the output of the previous
+layer) and to reconstruct it from a corrupted version of it. After this
 unsupervised initialization, the stack of denoising auto-encoders can be
 converted into a deep supervised feedforward neural network and fine-tuned by
 stochastic gradient descent.
@@ -95,6 +95,8 @@
 between different regions in input space or different tasks,
 as discussed in the conclusion.
 
+% TODO: why we care to evaluate this relative advantage
+
 In this paper we ask the following questions:
 
 %\begin{enumerate}
@@ -119,7 +121,7 @@
 a corresponding shallow and purely supervised architecture?
 %\end{enumerate}
 
-The experimental results presented here provide positive evidence towards all of these questions.
+Our experimental results provide positive evidence towards all of these questions.
 
 \vspace*{-1mm}
 \section{Perturbation and Transformation of Character Images}