changeset 460:fe292653a0f8

ajoute dernier tableau de resultats
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Thu, 27 May 2010 21:53:24 -0600
parents 5ead24fd4d49
children 9609c5cf9b6b
files writeup/techreport.tex
diffstat 1 files changed, 67 insertions(+), 5 deletions(-) [+]
line wrap: on
line diff
--- a/writeup/techreport.tex	Thu May 27 08:29:26 2010 -0600
+++ b/writeup/techreport.tex	Thu May 27 21:53:24 2010 -0600
@@ -389,21 +389,83 @@
 SDA0   &  23.7\% $\pm$.14\%  &  65.2\%$\pm$.34\%  & 97.45\%$\pm$.06\%  & 2.7\% $\pm$.14\%\\ \hline 
 SDA1   &  17.1\% $\pm$.13\%  &  29.7\%$\pm$.3\%  & 29.7\%$\pm$.3\%  & 1.4\% $\pm$.1\%\\ \hline 
 SDA2   &  18.7\% $\pm$.13\%  &  33.6\%$\pm$.3\%  & 39.9\%$\pm$.17\%  & 1.7\% $\pm$.1\%\\ \hline 
-MLP0   &  24.2\% $\pm$.15\%  &  \%$\pm$.35\%  & \%$\pm$.1\%  & 3.45\% $\pm$.16\% \\ \hline 
+MLP0   &  24.2\% $\pm$.15\%  & 68.8\%$\pm$.33\%  & 78.70\%$\pm$.14\%  & 3.45\% $\pm$.15\% \\ \hline 
 MLP1   &  23.0\% $\pm$.15\%  &  41.8\%$\pm$.35\%  & 90.4\%$\pm$.1\%  & 3.85\% $\pm$.16\% \\ \hline 
-MLP2   &  ?\% $\pm$.15\%  &  ?\%$\pm$.35\%  & 90.4\%$\pm$.1\%  & 3.85\% $\pm$.16\% \\ \hline 
+MLP2   &  24.3\% $\pm$.15\%  &  46.0\%$\pm$.35\%  & 54.7\%$\pm$.17\%  & 4.85\% $\pm$.18\% \\ \hline 
 \end{tabular}
 \end{center}
 \end{table}
 
 \subsection{Perturbed Training Data More Helpful for SDAE}
 
+\begin{table}
+\caption{Relative change in error rates due to the use of perturbed training data,
+either using NISTP, for the MLP1/SDA1 models, or using P07, for the MLP2/SDA2 models.
+A positive value indicates that training on the perturbed data helped for the
+given test set (the first 3 columns on the 62-class tasks and the last one is
+on the clean 10-class digits). Clearly, the deep learning models did benefit more
+from perturbed training data, even when testing on clean data, whereas the MLP
+trained on perturbed data performed worse on the clean digits and about the same
+on the clean characters. }
+\label{tab:sda-vs-mlp-vs-humans}
+\begin{center}
+\begin{tabular}{|l|r|r|r|r|} \hline
+      & NIST test          & NISTP test      & P07 test       & NIST test digits   \\ \hline
+SDA0/SDA1-1   &  38\%      &  84\%           & 228\%          &  93\% \\ \hline 
+SDA0/SDA2-1   &  27\%      &  94\%           & 144\%          &  59\% \\ \hline 
+MLP0/MLP1-1   &  5.2\%     &  65\%           & -13\%          & -10\%  \\ \hline 
+MLP0/MLP2-1   &  -0.4\%    &  49\%           & 44\%           & -29\% \\ \hline 
+\end{tabular}
+\end{center}
+\end{table}
 
-\subsection{Training with More Classes than Necessary}
+
+\subsection{Multi-Task Learning Effects}
 
-As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST.
+As previously seen, the SDA is better able to benefit from the
+transformations applied to the data than the MLP. In this experiment we
+define three tasks: recognizing digits (knowing that the input is a digit),
+recognizing upper case characters (knowing that the input is one), and
+recognizing lower case characters (knowing that the input is one).  We
+consider the digit classification task as the target task and we want to
+evaluate whether training with the other tasks can help or hurt, and
+whether the effect is different for MLPs versus SDAs.  The goal is to find
+out if deep learning can benefit more (or less) from multiple related tasks
+(i.e. the multi-task setting) compared to a corresponding purely supervised
+shallow learner.
+
+We use a single hidden layer MLP with 1000 hidden units, and a SDA
+with 3 hidden layers (1000 hidden units per layer), pre-trained and
+fine-tuned on NIST.
 
-Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors is 12\% smaller, 27\% smaller on digits, and 15\% smaller on lower case characters).
+Our results show that the MLP benefits marginally from the multi-task setting
+in the case of digits (5\% relative improvement) but is actually hurt in the case
+of characters (respectively 3\% and 4\% worse for lower and upper class characters).
+On the other hand the SDA benefitted from the multi-task setting, with relative
+error rate improvements of 27\%, 15\% and 13\% respectively for digits,
+lower and upper case characters, as shown in Table~\ref{tab:multi-task}.
+
+\begin{table}
+\caption{Test error rates and relative change in error rates due to the use of
+a multi-task setting, i.e., training on each task in isolation vs training
+for all three tasks together, for MLPs vs SDAs. The SDA benefits much
+more from the multi-task setting. All experiments on only on the
+unperturbed NIST data, using validation error for model selection.
+Relative improvement is 1 - single-task error / multi-task error.}
+\label{tab:multi-task}
+\begin{center}
+\begin{tabular}{|l|r|r|r|} \hline
+             & single-task  & multi-task  & relative \\ 
+             & setting      & setting     & improvement \\ \hline
+MLP-digits   &  3.77\%      &  3.99\%     & 5.6\%   \\ \hline 
+MLP-lower   &  17.4\%      &  16.8\%     &  -4.1\%    \\ \hline 
+MLP-upper   &  7.84\%     &  7.54\%      & -3.6\%    \\ \hline 
+SDA-digits   &  2.6\%      &  3.56\%     & 27\%    \\ \hline 
+SDA-lower   &  12.3\%      &  14.4\%    & 15\%    \\ \hline 
+SDA-upper   &  5.93\%     &  6.78\%      & 13\%    \\ \hline 
+\end{tabular}
+\end{center}
+\end{table}
 
 \section{Conclusions}