Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 537:47894d0ecbde
merge
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Tue, 01 Jun 2010 18:28:43 -0700 |
parents | 5157a5830125 22d5cd82d5f0 |
children | f0ee2212ea7c |
comparison
equal
deleted
inserted
replaced
536:5157a5830125 | 537:47894d0ecbde |
---|---|
84 stochastic gradient descent. | 84 stochastic gradient descent. |
85 | 85 |
86 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles | 86 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles |
87 of semi-supervised and multi-task learning: the learner can exploit examples | 87 of semi-supervised and multi-task learning: the learner can exploit examples |
88 that are unlabeled and/or come from a distribution different from the target | 88 that are unlabeled and/or come from a distribution different from the target |
89 distribution, e.g., from other classes that those of interest. Whereas | 89 distribution, e.g., from other classes that those of interest. |
90 it has already been shown that deep learners can clearly take advantage of | 90 It has already been shown that deep learners can clearly take advantage of |
91 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small} | 91 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, |
92 and multi-task learning, not much has been done yet to explore the impact | 92 but more needs to be done to explore the impact |
93 of {\em out-of-distribution} examples and of the multi-task setting | 93 of {\em out-of-distribution} examples and of the multi-task setting |
94 (but see~\citep{CollobertR2008}). In particular the {\em relative | 94 (one exception is~\citep{CollobertR2008}, but using very different kinds |
95 of learning algorithms). In particular the {\em relative | |
95 advantage} of deep learning for this settings has not been evaluated. | 96 advantage} of deep learning for this settings has not been evaluated. |
96 The hypothesis explored here is that a deep hierarchy of features | 97 The hypothesis explored here is that a deep hierarchy of features |
97 may be better able to provide sharing of statistical strength | 98 may be better able to provide sharing of statistical strength |
98 between different regions in input space or different tasks, | 99 between different regions in input space or different tasks, |
99 as discussed in the conclusion. | 100 as discussed in the conclusion. |
511 compositions of simpler ones through a deep hierarchy). | 512 compositions of simpler ones through a deep hierarchy). |
512 | 513 |
513 Here we chose to use the Denoising | 514 Here we chose to use the Denoising |
514 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for | 515 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for |
515 these deep hierarchies of features, as it is very simple to train and | 516 these deep hierarchies of features, as it is very simple to train and |
516 teach (see Figure~\ref{fig:da}, as well as | 517 explain (see Figure~\ref{fig:da}, as well as |
517 tutorial and code at {\tt http://deeplearning.net/tutorial}), | 518 tutorial and code there: {\tt http://deeplearning.net/tutorial}), |
518 provides immediate and efficient inference, and yielded results | 519 provides immediate and efficient inference, and yielded results |
519 comparable or better than RBMs in series of experiments | 520 comparable or better than RBMs in series of experiments |
520 \citep{VincentPLarochelleH2008}. During training, a Denoising | 521 \citep{VincentPLarochelleH2008}. During training, a Denoising |
521 Auto-Encoder is presented with a stochastically corrupted version | 522 Auto-Encoder is presented with a stochastically corrupted version |
522 of the input and trained to reconstruct the uncorrupted input, | 523 of the input and trained to reconstruct the uncorrupted input, |