586
|
1 \documentclass{article} % For LaTeX2e
|
|
2 \usepackage{times}
|
|
3 \usepackage{wrapfig}
|
|
4 \usepackage{amsthm,amsmath,bbm}
|
|
5 \usepackage[psamsfonts]{amssymb}
|
|
6 \usepackage{algorithm,algorithmic}
|
|
7 \usepackage[utf8]{inputenc}
|
|
8 \usepackage{graphicx,subfigure}
|
|
9 \usepackage[numbers]{natbib}
|
|
10
|
|
11 \addtolength{\textwidth}{10mm}
|
|
12 \addtolength{\evensidemargin}{-5mm}
|
|
13 \addtolength{\oddsidemargin}{-5mm}
|
|
14
|
|
15 %\setlength\parindent{0mm}
|
|
16
|
|
17 \begin{document}
|
|
18
|
|
19 \begin{center}
|
|
20 {\Large Deep Self-Taught Learning for Handwritten Character Recognition}
|
|
21
|
|
22 {\bf \large Information on Main Contributions}
|
|
23 \end{center}
|
|
24
|
|
25 \setlength{\parindent}{0cm}
|
|
26
|
|
27 %\vspace*{-2mm}
|
|
28 \section*{Background and Related Contributions}
|
|
29 %\vspace*{-2mm}
|
|
30 %{\large \bf Background and Related Contributions}
|
|
31
|
|
32 Recent theoretical and empirical work in statistical machine learning has
|
|
33 demonstrated the potential of learning algorithms for {\bf deep
|
|
34 architectures}, i.e., function classes obtained by composing multiple
|
|
35 levels of representation
|
|
36 \citep{Hinton06,ranzato-07-small,Bengio-nips-2006,VincentPLarochelleH2008,ranzato-08,Larochelle-jmlr-2009,Salakhutdinov+Hinton-2009,HonglakL2009,HonglakLNIPS2009,Jarrett-ICCV2009,Taylor-cvpr-2010}.
|
|
37 See~\citet{Bengio-2009} for a review of deep learning algorithms.
|
|
38
|
|
39 {\bf Self-taught learning}~\citep{RainaR2007} is a paradigm that combines
|
|
40 principles of semi-supervised and multi-task learning: the learner can
|
|
41 exploit examples that are unlabeled and possibly come from a distribution
|
|
42 different from the target distribution, e.g., from other classes than those
|
|
43 of interest. Self-taught learning has already been applied to deep
|
|
44 learners, but mostly to show the advantage of unlabeled
|
|
45 examples~\citep{Bengio-2009,WestonJ2008-small}.
|
|
46
|
|
47 There already are theoretical arguments~\citep{baxter95a} supporting the claim
|
|
48 that learning an {\bf intermediate representation} shared across tasks can be
|
|
49 beneficial for multi-task learning. It has also already been argued~\citep{Bengio-2009}
|
|
50 that {\bf multiple levels of representation} can bring a benefit over a single level.
|
|
51
|
|
52 %{\large \bf Main Claim}
|
|
53 %\vspace*{-2mm}
|
|
54 \section*{Main Claim}
|
|
55 %\vspace*{-2mm}
|
|
56
|
|
57 We claim that deep learners, with several levels of representation, can
|
|
58 benefit more from self-taught learning than shallow learners (with a single
|
|
59 level), both in the context of the multi-task setting and from {\em
|
|
60 out-of-distribution examples} in general.
|
|
61
|
|
62 %{\large \bf Contribution to Machine Learning}
|
|
63 %\vspace*{-2mm}
|
|
64 \section*{Contribution to Machine Learning}
|
|
65 %\vspace*{-2mm}
|
|
66
|
|
67 We show evidence for the above claim in a large-scale setting, with
|
|
68 a training set consisting of hundreds of millions of examples, in the
|
|
69 context of handwritten character recognition with 62 classes (upper-case,
|
|
70 lower-case, digits).
|
|
71
|
|
72 %{\large \bf Evidence to Support the Claim}
|
|
73 %\vspace*{-2mm}
|
|
74 \section*{Evidence to Support the Claim}
|
|
75 %\vspace*{-2mm}
|
|
76
|
|
77 In the above experimental setting, we show that {\em deep learners benefited
|
|
78 significantly more from the multi-task setting than a corresponding shallow
|
|
79 learner}. and that they benefited more from {\em distorted (out-of-distribution) examples}
|
|
80 (i.e. from a distribution larger than the one from which test examples come from).
|
|
81
|
|
82 In addition, we show that they {\em beat previously published results} on this task
|
|
83 (the MNIST special database 19)
|
|
84 and {\bf reach human-level performance} on both handwritten digit classification and
|
|
85 62-class handwritten character recognition.
|
|
86
|
|
87 \newpage
|
|
88
|
|
89 {\small
|
|
90 \bibliography{strings,strings-short,strings-shorter,ift6266_ml,specials,aigaion-shorter}
|
|
91 %\bibliographystyle{plainnat}
|
|
92 \bibliographystyle{unsrtnat}
|
|
93 %\bibliographystyle{apalike}
|
|
94 }
|
|
95
|
|
96
|
|
97 \end{document}
|