Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 511:d057941417ed
a few changes in the first section
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Tue, 01 Jun 2010 11:04:09 -0700 |
parents | 8c2ab4f246b1 |
children | 920a38715c90 0a5945249f2b |
comparison
equal
deleted
inserted
replaced
510:8c2ab4f246b1 | 511:d057941417ed |
---|---|
18 \vspace*{-2mm} | 18 \vspace*{-2mm} |
19 \begin{abstract} | 19 \begin{abstract} |
20 Recent theoretical and empirical work in statistical machine learning has | 20 Recent theoretical and empirical work in statistical machine learning has |
21 demonstrated the importance of learning algorithms for deep | 21 demonstrated the importance of learning algorithms for deep |
22 architectures, i.e., function classes obtained by composing multiple | 22 architectures, i.e., function classes obtained by composing multiple |
23 non-linear transformations. The self-taught learning (exploiting unlabeled | 23 non-linear transformations. Self-taught learning (exploiting unlabeled |
24 examples or examples from other distributions) has already been applied | 24 examples or examples from other distributions) has already been applied |
25 to deep learners, but mostly to show the advantage of unlabeled | 25 to deep learners, but mostly to show the advantage of unlabeled |
26 examples. Here we explore the advantage brought by {\em out-of-distribution | 26 examples. Here we explore the advantage brought by {\em out-of-distribution |
27 examples} and show that {\em deep learners benefit more from them than a | 27 examples} and show that {\em deep learners benefit more from them than a |
28 corresponding shallow learner}, in the area | 28 corresponding shallow learner}, in the area |
72 applied here, is the Denoising | 72 applied here, is the Denoising |
73 Auto-Encoder~(DEA)~\citep{VincentPLarochelleH2008-very-small}, which | 73 Auto-Encoder~(DEA)~\citep{VincentPLarochelleH2008-very-small}, which |
74 performed similarly or better than previously proposed Restricted Boltzmann | 74 performed similarly or better than previously proposed Restricted Boltzmann |
75 Machines in terms of unsupervised extraction of a hierarchy of features | 75 Machines in terms of unsupervised extraction of a hierarchy of features |
76 useful for classification. The principle is that each layer starting from | 76 useful for classification. The principle is that each layer starting from |
77 the bottom is trained to encode their input (the output of the previous | 77 the bottom is trained to encode its input (the output of the previous |
78 layer) and try to reconstruct it from a corrupted version of it. After this | 78 layer) and to reconstruct it from a corrupted version of it. After this |
79 unsupervised initialization, the stack of denoising auto-encoders can be | 79 unsupervised initialization, the stack of denoising auto-encoders can be |
80 converted into a deep supervised feedforward neural network and fine-tuned by | 80 converted into a deep supervised feedforward neural network and fine-tuned by |
81 stochastic gradient descent. | 81 stochastic gradient descent. |
82 | 82 |
83 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles | 83 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles |
89 and multi-task learning, not much has been done yet to explore the impact | 89 and multi-task learning, not much has been done yet to explore the impact |
90 of {\em out-of-distribution} examples and of the multi-task setting | 90 of {\em out-of-distribution} examples and of the multi-task setting |
91 (but see~\citep{CollobertR2008}). In particular the {\em relative | 91 (but see~\citep{CollobertR2008}). In particular the {\em relative |
92 advantage} of deep learning for this settings has not been evaluated. | 92 advantage} of deep learning for this settings has not been evaluated. |
93 | 93 |
94 % TODO: why we care to evaluate this relative advantage | |
95 | |
94 In this paper we ask the following questions: | 96 In this paper we ask the following questions: |
95 | 97 |
96 %\begin{enumerate} | 98 %\begin{enumerate} |
97 $\bullet$ %\item | 99 $\bullet$ %\item |
98 Do the good results previously obtained with deep architectures on the | 100 Do the good results previously obtained with deep architectures on the |
113 Similarly, does the feature learning step in deep learning algorithms benefit more | 115 Similarly, does the feature learning step in deep learning algorithms benefit more |
114 training with similar but different classes (i.e. a multi-task learning scenario) than | 116 training with similar but different classes (i.e. a multi-task learning scenario) than |
115 a corresponding shallow and purely supervised architecture? | 117 a corresponding shallow and purely supervised architecture? |
116 %\end{enumerate} | 118 %\end{enumerate} |
117 | 119 |
118 The experimental results presented here provide positive evidence towards all of these questions. | 120 Our experimental results provide positive evidence towards all of these questions. |
119 | 121 |
120 \vspace*{-1mm} | 122 \vspace*{-1mm} |
121 \section{Perturbation and Transformation of Character Images} | 123 \section{Perturbation and Transformation of Character Images} |
122 \vspace*{-1mm} | 124 \vspace*{-1mm} |
123 | 125 |