Mercurial > ift6266
comparison writeup/techreport.tex @ 443:89a49dae6cf3
merge
author | Xavier Glorot <glorotxa@iro.umontreal.ca> |
---|---|
date | Mon, 03 May 2010 18:38:58 -0400 |
parents | d5b2b6397a5a 1272dc84a30c |
children | 18841eeb433f |
comparison
equal
deleted
inserted
replaced
442:d5b2b6397a5a | 443:89a49dae6cf3 |
---|---|
29 with this pipeline, using the hundreds of millions of generated examples | 29 with this pipeline, using the hundreds of millions of generated examples |
30 and testing on the full NIST test set. | 30 and testing on the full NIST test set. |
31 We find that the SDA outperforms its | 31 We find that the SDA outperforms its |
32 shallow counterpart, an ordinary Multi-Layer Perceptron, | 32 shallow counterpart, an ordinary Multi-Layer Perceptron, |
33 and that it is better able to take advantage of the additional | 33 and that it is better able to take advantage of the additional |
34 generated data. | 34 generated data, as well as better able to take advantage of |
35 training from more classes than those of interest in the end. | |
36 In fact, we find that the SDA reaches human performance as | |
37 estimated by the Amazon Mechanical Turk on the NIST test characters. | |
35 \end{abstract} | 38 \end{abstract} |
36 | 39 |
37 \section{Introduction} | 40 \section{Introduction} |
38 | 41 |
39 Deep Learning has emerged as a promising new area of research in | 42 Deep Learning has emerged as a promising new area of research in |
261 | 264 |
262 \subsubsection{Data Sources} | 265 \subsubsection{Data Sources} |
263 | 266 |
264 \begin{itemize} | 267 \begin{itemize} |
265 \item {\bf NIST} | 268 \item {\bf NIST} |
269 The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. | |
270 The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, | |
271 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes | |
272 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. | |
273 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended | |
274 by NIST as testing set and is used in our work for that purpose. | |
275 The performances reported by previous work on that dataset mostly use only the digits. | |
276 Here we use the whole classes both in the training and testing phase. | |
277 | |
278 | |
266 \item {\bf Fonts} | 279 \item {\bf Fonts} |
267 \item {\bf Captchas} | 280 \item {\bf Captchas} |
281 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for | |
282 generating characters of the same format as the NIST dataset. The core of this data source is composed with a random character | |
283 generator and various kinds of tranformations similar to those described in the previous sections. | |
284 In order to increase the variability of the data generated, different fonts are used for generating the characters. | |
285 Transformations (slant, distorsions, rotation, translation) are applied to each randomly generated character with a complexity | |
286 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are | |
287 allowed and can be controlled via an easy to use facade class. | |
268 \item {\bf OCR data} | 288 \item {\bf OCR data} |
269 \end{itemize} | 289 \end{itemize} |
270 | 290 |
271 \subsubsection{Data Sets} | 291 \subsubsection{Data Sets} |
272 \begin{itemize} | 292 \begin{itemize} |
321 The second and following layers receive the same treatment except that they take as input the encoded version of the data that has gone through the layers before it. | 341 The second and following layers receive the same treatment except that they take as input the encoded version of the data that has gone through the layers before it. |
322 For additional details see \cite{vincent:icml08}. | 342 For additional details see \cite{vincent:icml08}. |
323 | 343 |
324 \section{Experimental Results} | 344 \section{Experimental Results} |
325 | 345 |
326 \subsection{SDA vs MLP} | 346 \subsection{SDA vs MLP vs Humans} |
327 | 347 |
348 We compare here the best MLP (according to validation set error) that we found against | |
349 the best SDA (again according to validation set error), along with a precise estimate | |
350 of human performance obtained via Amazon's Mechanical Turk (AMT) | |
351 service\footnote{http://mturk.com}. AMT users are paid small amounts | |
352 of money to perform tasks for which human intelligence is required. | |
353 Mechanical Turk has been used extensively in natural language | |
354 processing \cite{SnowEtAl2008} and vision | |
355 \cite{SorokinAndForsyth2008,whitehill09}. AMT users where presented | |
356 with 10 character images and asked to type 10 corresponding ascii | |
357 characters. Hence they were forced to make a hard choice among the | |
358 62 character classes. Three users classified each image, allowing | |
359 to estimate inter-human variability (shown as +/- in parenthesis below). | |
360 | |
361 \begin{table} | |
362 \caption{Overall comparison of error rates on 62 character classes (10 digits + | |
363 26 lower + 26 upper), except for last columns -- digits only, between deep architecture with pre-training | |
364 (SDA=Stacked Denoising Autoencoder) and ordinary shallow architecture | |
365 (MLP=Multi-Layer Perceptron). } | |
366 \label{tab:sda-vs-mlp-vs-humans} | |
328 \begin{center} | 367 \begin{center} |
329 \begin{tabular}{lcc} | 368 \begin{tabular}{|l|r|r|r|r|} \hline |
330 & train w/ & train w/ \\ | 369 & NIST test & NISTP test & P07 test & NIST test digits \\ \hline |
331 & NIST & P07 + NIST \\ \hline | 370 Humans& & & & \\ \hline |
332 SDA & & \\ \hline | 371 SDA & & & &\\ \hline |
333 MLP & & \\ \hline | 372 MLP & & & & \\ \hline |
334 \end{tabular} | 373 \end{tabular} |
335 \end{center} | 374 \end{center} |
375 \end{table} | |
336 | 376 |
337 \subsection{Perturbed Training Data More Helpful for SDAE} | 377 \subsection{Perturbed Training Data More Helpful for SDAE} |
338 | 378 |
339 \subsection{Training with More Classes than Necessary} | 379 \subsection{Training with More Classes than Necessary} |
380 | |
381 As previously seen, the SDA is better able to benefit from the transformations applied to the data than the MLP. We are now training SDAs and MLPs on single classes from NIST (respectively digits, lower case characters and upper case characters), to compare the test results with those from models trained on the entire NIST database (per-class test error, with an a priori on the desired class). The goal is to find out if training the model with more classes than necessary reduces the test error on a single class, as opposed to training it only with the desired class. We use a single hidden layer MLP with 1000 hidden units, and a SDA with 3 hidden layers (1000 hidden units per layer), pre-trained and fine-tuned on NIST. | |
382 | |
383 Our results show that the MLP only benefits from a full NIST training on digits, and the test error is only 5\% smaller than a digits-specialized MLP. On the other hand, the SDA always gives better results when it is trained with the entire NIST database, compared to its specialized counterparts (with upper case character, the test errors is 12\% smaller, 27\% smaller on digits, and 15\% smaller on lower case characters). | |
340 | 384 |
341 \section{Conclusions} | 385 \section{Conclusions} |
342 | 386 |
343 \bibliography{strings,ml,aigaion,specials} | 387 \bibliography{strings,ml,aigaion,specials} |
344 \bibliographystyle{mlapa} | 388 \bibliographystyle{mlapa} |