diff writeup/mlj_submission.tex @ 587:b1be957dd1be

Added mlj_submission to group every file needed for that.
author fsavard
date Thu, 30 Sep 2010 17:51:02 -0400
parents 4933077b8676
children 9a6abcf143e8
line wrap: on
line diff
--- a/writeup/mlj_submission.tex	Wed Sep 29 21:06:47 2010 -0400
+++ b/writeup/mlj_submission.tex	Thu Sep 30 17:51:02 2010 -0400
@@ -1,12 +1,18 @@
-\documentclass{article} % For LaTeX2e
+\RequirePackage{fix-cm} % from template
+
+%\documentclass{article} % For LaTeX2e
+\documentclass[smallcondensed]{svjour3}     % onecolumn (ditto)
+
 \usepackage{times}
 \usepackage{wrapfig}
-\usepackage{amsthm,amsmath,bbm} 
+%\usepackage{amsthm} % not to be used with springer tools
+\usepackage{amsmath}
+\usepackage{bbm}
 \usepackage[psamsfonts]{amssymb}
-\usepackage{algorithm,algorithmic}
+%\usepackage{algorithm,algorithmic} % not used after all
 \usepackage[utf8]{inputenc}
 \usepackage{graphicx,subfigure}
-\usepackage[numbers]{natbib}
+\usepackage{natbib} % was [numbers]{natbib}
 
 \addtolength{\textwidth}{10mm}
 \addtolength{\evensidemargin}{-5mm}
@@ -16,8 +22,8 @@
 
 \title{Deep Self-Taught Learning for Handwritten Character Recognition}
 \author{
+Yoshua  Bengio \and
 Frédéric  Bastien \and
-Yoshua  Bengio \and
 Arnaud  Bergeron \and
 Nicolas  Boulanger-Lewandowski \and
 Thomas  Breuel \and
@@ -35,6 +41,30 @@
 Guillaume  Sicard 
 }
 \date{September 30th, submission to MLJ special issue on learning from multi-label data}
+\journalname{Machine Learning Journal}
+\institute{Frédéric  Bastien \and \\
+		Yoshua  Bengio \and \\
+		Arnaud  Bergeron \and \\
+		Nicolas  Boulanger-Lewandowski \and \\
+		Youssouf  Chherawala \and \\
+		Moustapha  Cisse \and \\ 
+		Myriam  Côté \and  \\
+		Dumitru  Erhan \and \\
+		Jeremy  Eustache \and \\
+		Xavier  Glorot \and  \\
+		Xavier  Muller \and \\
+		Sylvain  Pannetier-Lebeuf \and \\
+		Razvan  Pascanu \and  \\
+		Salah  Rifai \and \\
+		Francois  Savard \and \\
+		Guillaume  Sicard \at
+	Dept. IRO, Universite de Montreal, C.P. 6128, Montreal, QC, H3C 3J7, Canada\\
+		\email{yoshua.bengio@umontreal.ca}
+	\and
+		Thomas  Breuel \at
+	Department of Computer Science, University of Kaiserslautern, Postfach 3049, 67653 Kaiserslautern, Germany
+}
+
 
 \begin{document}
 
@@ -46,14 +76,14 @@
   Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples}.  For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set.  We show that {\em deep learners benefit more from out-of-distribution examples than a corresponding shallow learner}, at least in the area of handwritten character recognition. In fact, we show that they beat previously published results and reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition.
 \end{abstract}
 %\vspace*{-3mm}
-
+ 
 Keywords: self-taught learning, multi-task learning, out-of-distribution examples, deep learning, handwriting recognition.
 
 \section{Introduction}
 %\vspace*{-1mm}
 
 {\bf Deep Learning} has emerged as a promising new area of research in
-statistical machine learning (see~\citet{Bengio-2009} for a review).
+statistical machine learning (see \citet{Bengio-2009} for a review).
 Learning algorithms for deep architectures are centered on the learning
 of useful representations of data, which are better suited to the task at hand,
 and are organized in a hierarchy with multiple levels.
@@ -62,7 +92,7 @@
 different representation of the raw visual input. In fact,
 it was found recently that the features learnt in deep architectures resemble
 those observed in the first two of these stages (in areas V1 and V2
-of visual cortex)~\citep{HonglakL2008}, and that they become more and
+of visual cortex) \citep{HonglakL2008}, and that they become more and
 more invariant to factors of variation (such as camera movement) in
 higher layers~\citep{Goodfellow2009}.
 Learning a hierarchy of features increases the
@@ -1013,7 +1043,7 @@
 does not allow the model to go from the poorer basins of attraction discovered
 by the purely supervised shallow models to the kind of better basins associated
 with deep learning and self-taught learning.
-
+ 
 A Flash demo of the recognizer (where both the MLP and the SDA can be compared) 
 can be executed on-line at {\tt http://deep.host22.com}.
 
@@ -1099,9 +1129,10 @@
 %\afterpage{\clearpage}
 \clearpage
 {
+\bibliographystyle{spbasic}      % basic style, author-year citations
 \bibliography{strings,strings-short,strings-shorter,ift6266_ml,specials,aigaion-shorter}
 %\bibliographystyle{plainnat}
-\bibliographystyle{unsrtnat}
+%\bibliographystyle{unsrtnat}
 %\bibliographystyle{apalike}
 }