comparison writeup/nips2010_submission.tex @ 532:2e33885730cf

changements aux charts.ods
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 21:19:54 -0400
parents 4354c3c8f49c
children 22d5cd82d5f0
comparison
equal deleted inserted replaced
529:4354c3c8f49c 532:2e33885730cf
83 stochastic gradient descent. 83 stochastic gradient descent.
84 84
85 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles 85 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles
86 of semi-supervised and multi-task learning: the learner can exploit examples 86 of semi-supervised and multi-task learning: the learner can exploit examples
87 that are unlabeled and/or come from a distribution different from the target 87 that are unlabeled and/or come from a distribution different from the target
88 distribution, e.g., from other classes that those of interest. Whereas 88 distribution, e.g., from other classes that those of interest.
89 it has already been shown that deep learners can clearly take advantage of 89 It has already been shown that deep learners can clearly take advantage of
90 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small} 90 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small},
91 and multi-task learning, not much has been done yet to explore the impact 91 but more needs to be done to explore the impact
92 of {\em out-of-distribution} examples and of the multi-task setting 92 of {\em out-of-distribution} examples and of the multi-task setting
93 (but see~\citep{CollobertR2008}). In particular the {\em relative 93 (one exception is~\citep{CollobertR2008}, but using very different kinds
94 of learning algorithms). In particular the {\em relative
94 advantage} of deep learning for this settings has not been evaluated. 95 advantage} of deep learning for this settings has not been evaluated.
95 The hypothesis explored here is that a deep hierarchy of features 96 The hypothesis explored here is that a deep hierarchy of features
96 may be better able to provide sharing of statistical strength 97 may be better able to provide sharing of statistical strength
97 between different regions in input space or different tasks, 98 between different regions in input space or different tasks,
98 as discussed in the conclusion. 99 as discussed in the conclusion.
508 deep architecture (whereby complex concepts are expressed as 509 deep architecture (whereby complex concepts are expressed as
509 compositions of simpler ones through a deep hierarchy). 510 compositions of simpler ones through a deep hierarchy).
510 Here we chose to use the Denoising 511 Here we chose to use the Denoising
511 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for 512 Auto-Encoder~\citep{VincentPLarochelleH2008} as the building block for
512 these deep hierarchies of features, as it is very simple to train and 513 these deep hierarchies of features, as it is very simple to train and
513 teach (see Figure~\ref{fig:da}, as well as 514 explain (see Figure~\ref{fig:da}, as well as
514 tutorial and code there: {\tt http://deeplearning.net/tutorial}), 515 tutorial and code there: {\tt http://deeplearning.net/tutorial}),
515 provides immediate and efficient inference, and yielded results 516 provides immediate and efficient inference, and yielded results
516 comparable or better than RBMs in series of experiments 517 comparable or better than RBMs in series of experiments
517 \citep{VincentPLarochelleH2008}. During training, a Denoising 518 \citep{VincentPLarochelleH2008}. During training, a Denoising
518 Auto-Encoder is presented with a stochastically corrupted version 519 Auto-Encoder is presented with a stochastically corrupted version