Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 532:2e33885730cf
changements aux charts.ods
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 01 Jun 2010 21:19:54 -0400 |
parents | 4354c3c8f49c |
children | 22d5cd82d5f0 |
comparison
equal
deleted
inserted
replaced
529:4354c3c8f49c | 532:2e33885730cf |
---|---|
83 stochastic gradient descent. | 83 stochastic gradient descent. |
84 | 84 |
85 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles | 85 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles |
86 of semi-supervised and multi-task learning: the learner can exploit examples | 86 of semi-supervised and multi-task learning: the learner can exploit examples |
87 that are unlabeled and/or come from a distribution different from the target | 87 that are unlabeled and/or come from a distribution different from the target |
88 distribution, e.g., from other classes that those of interest. Whereas | 88 distribution, e.g., from other classes that those of interest. |
89 it has already been shown that deep learners can clearly take advantage of | 89 It has already been shown that deep learners can clearly take advantage of |
90 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small} | 90 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, |
91 and multi-task learning, not much has been done yet to explore the impact | 91 but more needs to be done to explore the impact |
92 of {\em out-of-distribution} examples and of the multi-task setting | 92 of {\em out-of-distribution} examples and of the multi-task setting |
93 (but see~\citep{CollobertR2008}). In particular the {\em relative | 93 (one exception is~\citep{CollobertR2008}, but using very different kinds |
94 of learning algorithms). In particular the {\em relative | |
94 advantage} of deep learning for this settings has not been evaluated. | 95 advantage} of deep learning for this settings has not been evaluated. |
95 The hypothesis explored here is that a deep hierarchy of features | 96 The hypothesis explored here is that a deep hierarchy of features |
96 may be better able to provide sharing of statistical strength | 97 may be better able to provide sharing of statistical strength |
97 between different regions in input space or different tasks, | 98 between different regions in input space or different tasks, |
98 as discussed in the conclusion. | 99 as discussed in the conclusion. |
508 deep architecture (whereby complex concepts are expressed as | 509 deep architecture (whereby complex concepts are expressed as |
509 compositions of simpler ones through a deep hierarchy). | 510 compositions of simpler ones through a deep hierarchy). |
510 Here we chose to use the Denoising | 511 Here we chose to use the Denoising |
511 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for | 512 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for |
512 these deep hierarchies of features, as it is very simple to train and | 513 these deep hierarchies of features, as it is very simple to train and |
513 teach (see Figure~\ref{fig:da}, as well as | 514 explain (see Figure~\ref{fig:da}, as well as |
514 tutorial and code there: {\tt http://deeplearning.net/tutorial}), | 515 tutorial and code there: {\tt http://deeplearning.net/tutorial}), |
515 provides immediate and efficient inference, and yielded results | 516 provides immediate and efficient inference, and yielded results |
516 comparable or better than RBMs in series of experiments | 517 comparable or better than RBMs in series of experiments |
517 \citep{VincentPLarochelleH2008}. During training, a Denoising | 518 \citep{VincentPLarochelleH2008}. During training, a Denoising |
518 Auto-Encoder is presented with a stochastically corrupted version | 519 Auto-Encoder is presented with a stochastically corrupted version |