ift6266: writeup/techreport.tex comparison

comparison writeup/techreport.tex @ 438:a6d339033d03

added AMT

author	Yoshua Bengio <bengioy@iro.umontreal.ca>
date	Mon, 03 May 2010 07:46:18 -0400
parents	479f2f518fc9
children	89258bb41e4c

comparison

equal deleted inserted replaced

-:479f2f518fc9
+:a6d339033d03
 with this pipeline, using the hundreds of millions of generated examples
 and testing on the full NIST test set.
 We find that the SDA outperforms its
 shallow counterpart, an ordinary Multi-Layer Perceptron,
 and that it is better able to take advantage of the additional
-generated data.
+generated data, as well as better able to take advantage of
+training from more classes than those of interest in the end.
+In fact, we find that the SDA reaches human performance as
+estimated by the Amazon Mechanical Turk on the NIST test characters.
 \end{abstract}
 \section{Introduction}
 Deep Learning has emerged as a promising new area of research in
 The second and following layers receive the same treatment except that they take as input the encoded version of the data that has gone through the layers before it.
 For additional details see \cite{vincent:icml08}.
 \section{Experimental Results}
-\subsection{SDA vs MLP}
+\subsection{SDA vs MLP vs Humans}
+We compare here the best MLP (according to validation set error) that we found against
+the best SDA (again according to validation set error), along with a precise estimate
+of human performance obtained via Amazon's Mechanical Turk (AMT)
+service\footnote{http://mturk.com}. AMT users are paid small amounts
+of money to perform tasks for which human intelligence is required.
+Mechanical Turk has been used extensively in natural language
+processing \cite{SnowEtAl2008} and vision
+\cite{SorokinAndForsyth2008,whitehill09}. AMT users where presented
+with 10 character images and asked to type 10 corresponding ascii
+characters. Hence they were forced to make a hard choice among the
+62 character classes. Three users classified each image, allowing
+to estimate inter-human variability (shown as +/- in parenthesis below).
+\begin{table}
+\caption{Overall comparison of error rates on 62 character classes (10 digits +
+26 lower + 26 upper), except for last columns -- digits only, between deep architecture with pre-training
+(SDA=Stacked Denoising Autoencoder) and ordinary shallow architecture
+(MLP=Multi-Layer Perceptron). }
+\label{tab:sda-vs-mlp-vs-humans}
 \begin{center}
-\begin{tabular}{lcc}
+\begin{tabular}{|l|r|r|r|r|} \hline
-& train w/   & train w/    \\
+& NIST test & NISTP test & P07 test  & NIST test digits   \\ \hline
-& NIST       & P07 + NIST  \\ \hline
+Humans&            &           &   & \\ \hline
-SDA   &            &             \\ \hline
+SDA   &            &           &  &\\ \hline
-MLP   &            &             \\ \hline
+MLP   &            &           &  & \\ \hline
 \end{tabular}
 \end{center}
+\end{table}
 \subsection{Perturbed Training Data More Helpful for SDAE}
 \subsection{Training with More Classes than Necessary}

Mercurial > ift6266

comparison writeup/techreport.tex @ 438:a6d339033d03