ift6266: writeup/nips2010_submission.tex comparison

comparison writeup/nips2010_submission.tex @ 514:920a38715c90

merge

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Tue, 01 Jun 2010 14:05:21 -0400
parents	66a905508e34 d057941417ed
children	092dae9a5040

comparison

equal deleted inserted replaced

-:66a905508e34
+:920a38715c90
 \vspace*{-2mm}
 \begin{abstract}
 Recent theoretical and empirical work in statistical machine learning has
 demonstrated the importance of learning algorithms for deep
 architectures, i.e., function classes obtained by composing multiple
-non-linear transformations. The self-taught learning (exploiting unlabeled
+non-linear transformations. Self-taught learning (exploiting unlabeled
 examples or examples from other distributions) has already been applied
 to deep learners, but mostly to show the advantage of unlabeled
 examples. Here we explore the advantage brought by {\em out-of-distribution
 examples} and show that {\em deep learners benefit more from them than a
 corresponding shallow learner}, in the area
 applied here, is the Denoising
 Auto-Encoder~(DEA)~\citep{VincentPLarochelleH2008-very-small}, which
 performed similarly or better than previously proposed Restricted Boltzmann
 Machines in terms of unsupervised extraction of a hierarchy of features
 useful for classification.  The principle is that each layer starting from
-the bottom is trained to encode their input (the output of the previous
+the bottom is trained to encode its input (the output of the previous
-layer) and try to reconstruct it from a corrupted version of it. After this
+layer) and to reconstruct it from a corrupted version of it. After this
 unsupervised initialization, the stack of denoising auto-encoders can be
 converted into a deep supervised feedforward neural network and fine-tuned by
 stochastic gradient descent.
 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles
 The hypothesis explored here is that a deep hierarchy of features
 may be better able to provide sharing of statistical strength
 between different regions in input space or different tasks,
 as discussed in the conclusion.
+% TODO: why we care to evaluate this relative advantage
 In this paper we ask the following questions:
 %\begin{enumerate}
 $\bullet$ %\item
 Do the good results previously obtained with deep architectures on the
 Similarly, does the feature learning step in deep learning algorithms benefit more
 training with similar but different classes (i.e. a multi-task learning scenario) than
 a corresponding shallow and purely supervised architecture?
 %\end{enumerate}
-The experimental results presented here provide positive evidence towards all of these questions.
+Our experimental results provide positive evidence towards all of these questions.
 \vspace*{-1mm}
 \section{Perturbation and Transformation of Character Images}
 \vspace*{-1mm}

Mercurial > ift6266

comparison writeup/nips2010_submission.tex @ 514:920a38715c90