comparison writeup/nips2010_submission.tex @ 537:47894d0ecbde

merge
author Dumitru Erhan <dumitru.erhan@gmail.com>
date Tue, 01 Jun 2010 18:28:43 -0700
parents 5157a5830125 22d5cd82d5f0
children f0ee2212ea7c
comparison
equal deleted inserted replaced
536:5157a5830125 537:47894d0ecbde
84 stochastic gradient descent. 84 stochastic gradient descent.
85 85
86 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles 86 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles
87 of semi-supervised and multi-task learning: the learner can exploit examples 87 of semi-supervised and multi-task learning: the learner can exploit examples
88 that are unlabeled and/or come from a distribution different from the target 88 that are unlabeled and/or come from a distribution different from the target
89 distribution, e.g., from other classes that those of interest. Whereas 89 distribution, e.g., from other classes that those of interest.
90 it has already been shown that deep learners can clearly take advantage of 90 It has already been shown that deep learners can clearly take advantage of
91 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small} 91 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small},
92 and multi-task learning, not much has been done yet to explore the impact 92 but more needs to be done to explore the impact
93 of {\em out-of-distribution} examples and of the multi-task setting 93 of {\em out-of-distribution} examples and of the multi-task setting
94 (but see~\citep{CollobertR2008}). In particular the {\em relative 94 (one exception is~\citep{CollobertR2008}, but using very different kinds
95 of learning algorithms). In particular the {\em relative
95 advantage} of deep learning for this settings has not been evaluated. 96 advantage} of deep learning for this settings has not been evaluated.
96 The hypothesis explored here is that a deep hierarchy of features 97 The hypothesis explored here is that a deep hierarchy of features
97 may be better able to provide sharing of statistical strength 98 may be better able to provide sharing of statistical strength
98 between different regions in input space or different tasks, 99 between different regions in input space or different tasks,
99 as discussed in the conclusion. 100 as discussed in the conclusion.
511 compositions of simpler ones through a deep hierarchy). 512 compositions of simpler ones through a deep hierarchy).
512 513
513 Here we chose to use the Denoising 514 Here we chose to use the Denoising
514 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for 515 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for
515 these deep hierarchies of features, as it is very simple to train and 516 these deep hierarchies of features, as it is very simple to train and
516 teach (see Figure~\ref{fig:da}, as well as 517 explain (see Figure~\ref{fig:da}, as well as
517 tutorial and code at {\tt http://deeplearning.net/tutorial}), 518 tutorial and code there: {\tt http://deeplearning.net/tutorial}),
518 provides immediate and efficient inference, and yielded results 519 provides immediate and efficient inference, and yielded results
519 comparable or better than RBMs in series of experiments 520 comparable or better than RBMs in series of experiments
520 \citep{VincentPLarochelleH2008}. During training, a Denoising 521 \citep{VincentPLarochelleH2008}. During training, a Denoising
521 Auto-Encoder is presented with a stochastically corrupted version 522 Auto-Encoder is presented with a stochastically corrupted version
522 of the input and trained to reconstruct the uncorrupted input, 523 of the input and trained to reconstruct the uncorrupted input,