comparison writeup/nips2010_submission.tex @ 519:eaa595ea2402

section 3 quickpass
author Dumitru Erhan <dumitru.erhan@gmail.com>
date Tue, 01 Jun 2010 11:32:04 -0700
parents 460a4e78c9a4
children 18a6379999fd
comparison
equal deleted inserted replaced
518:460a4e78c9a4 519:eaa595ea2402
307 307
308 \iffalse 308 \iffalse
309 \begin{figure}[h] 309 \begin{figure}[h]
310 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ 310 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\
311 \caption{Illustration of the pipeline of stochastic 311 \caption{Illustration of the pipeline of stochastic
312 transformations applied to the image of a lower-case t 312 transformations applied to the image of a lower-case \emph{t}
313 (the upper left image). Each image in the pipeline (going from 313 (the upper left image). Each image in the pipeline (going from
314 left to right, first top line, then bottom line) shows the result 314 left to right, first top line, then bottom line) shows the result
315 of applying one of the modules in the pipeline. The last image 315 of applying one of the modules in the pipeline. The last image
316 (bottom right) is used as training example.} 316 (bottom right) is used as training example.}
317 \label{fig:pipeline} 317 \label{fig:pipeline}
359 %\item 359 %\item
360 {\bf NIST.} 360 {\bf NIST.}
361 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, 361 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
362 widely used for training and testing character 362 widely used for training and testing character
363 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. 363 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}.
364 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, 364 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications,
365 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 365 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
366 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. 366 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity.
367 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended 367 The fourth partition, $hsf_4$, experimentally recognized to be the most difficult one is the one recommended
368 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} 368 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
369 for that purpose. We randomly split the remainder into a training set and a validation set for 369 for that purpose. We randomly split the remainder into a training set and a validation set for
370 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, 370 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
371 and 82587 for testing. 371 and 82587 for testing.
372 The performances reported by previous work on that dataset mostly use only the digits. 372 The performances reported by previous work on that dataset mostly use only the digits.
373 Here we use all the classes both in the training and testing phase. This is especially 373 Here we use all the classes both in the training and testing phase. This is especially
374 useful to estimate the effect of a multi-task setting. 374 useful to estimate the effect of a multi-task setting.
375 Note that the distribution of the classes in the NIST training and test sets differs 375 Note that the distribution of the classes in the NIST training and test sets differs
376 substantially, with relatively many more digits in the test set, and uniform distribution 376 substantially, with relatively many more digits in the test set, and more uniform distribution
377 of letters in the test set, not in the training set (more like the natural distribution 377 of letters in the test set, compared to the training set (in the latter, the letters are distributed
378 of letters in text). 378 more like the natural distribution of letters in text).
379 379
380 %\item 380 %\item
381 {\bf Fonts.} 381 {\bf Fonts.}
382 In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net} 382 In order to have a good variety of sources we downloaded an important number of free fonts from:
383 %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} 383 {\tt http://cg.scs.carleton.ca/~luc/freefonts.html}
384 in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly. 384 % TODO: pointless to anonymize, it's not pointing to our work
385 Including operating system's (Windows 7) fonts, there is a total of $9817$ different fonts that we can choose uniformly from.
385 The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, 386 The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image,
386 directly as input to our models. 387 directly as input to our models.
387 388
388 %\item 389 %\item
389 {\bf Captchas.} 390 {\bf Captchas.}
390 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for 391 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for
391 generating characters of the same format as the NIST dataset. This software is based on 392 generating characters of the same format as the NIST dataset. This software is based on
392 a random character class generator and various kinds of transformations similar to those described in the previous sections. 393 a random character class generator and various kinds of transformations similar to those described in the previous sections.
393 In order to increase the variability of the data generated, many different fonts are used for generating the characters. 394 In order to increase the variability of the data generated, many different fonts are used for generating the characters.
394 Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity 395 Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity
395 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are 396 depending on the value of the complexity parameter provided by the user of the data source.
396 allowed and can be controlled via an easy to use facade class. 397 %Two levels of complexity are allowed and can be controlled via an easy to use facade class. %TODO: what's a facade class?
397 398
398 %\item 399 %\item
399 {\bf OCR data.} 400 {\bf OCR data.}
400 A large set (2 million) of scanned, OCRed and manually verified machine-printed 401 A large set (2 million) of scanned, OCRed and manually verified machine-printed
401 characters (from various documents and books) where included as an 402 characters (from various documents and books) where included as an
402 additional source. This set is part of a larger corpus being collected by the Image Understanding 403 additional source. This set is part of a larger corpus being collected by the Image Understanding
403 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern 404 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern
404 ({\tt http://www.iupr.com}), and which will be publicly released. 405 ({\tt http://www.iupr.com}), and which will be publicly released.
406 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this
405 %\end{itemize} 407 %\end{itemize}
406 408
407 \vspace*{-1mm} 409 \vspace*{-1mm}
408 \subsection{Data Sets} 410 \subsection{Data Sets}
409 \vspace*{-1mm} 411 \vspace*{-1mm}
442 Whereas previous work had compared deep architectures to both shallow MLPs and 444 Whereas previous work had compared deep architectures to both shallow MLPs and
443 SVMs, we only compared to MLPs here because of the very large datasets used 445 SVMs, we only compared to MLPs here because of the very large datasets used
444 (making the use of SVMs computationally inconvenient because of their quadratic 446 (making the use of SVMs computationally inconvenient because of their quadratic
445 scaling behavior). 447 scaling behavior).
446 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized 448 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized
447 exponentials) on the output layer for estimating P(class | image). 449 exponentials) on the output layer for estimating$ P(class | image)$.
448 The hyper-parameters are the following: number of hidden units, taken in 450 The number of hidden units is taken in $\{300,500,800,1000,1500\}$.
449 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training 451 The optimization procedure is as follows: training
450 examples are presented in minibatches of size 20. A constant learning 452 examples are presented in minibatches of size 20, a constant learning
451 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ 453 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
452 through preliminary experiments, and 0.1 was selected. 454 through preliminary experiments (measuring performance on a validation set),
455 and $0.1$ was then selected.
453 456
454 {\bf Stacked Denoising Auto-Encoders (SDA).} 457 {\bf Stacked Denoising Auto-Encoders (SDA).}
455 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) 458 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs)
456 can be used to initialize the weights of each layer of a deep MLP (with many hidden 459 can be used to initialize the weights of each layer of a deep MLP (with many hidden
457 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006} 460 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006}
470 % AJOUTER UNE IMAGE? 473 % AJOUTER UNE IMAGE?
471 these deep hierarchies of features, as it is very simple to train and 474 these deep hierarchies of features, as it is very simple to train and
472 teach (see tutorial and code there: {\tt http://deeplearning.net/tutorial}), 475 teach (see tutorial and code there: {\tt http://deeplearning.net/tutorial}),
473 provides immediate and efficient inference, and yielded results 476 provides immediate and efficient inference, and yielded results
474 comparable or better than RBMs in series of experiments 477 comparable or better than RBMs in series of experiments
475 \citep{VincentPLarochelleH2008}. During training of a Denoising 478 \citep{VincentPLarochelleH2008}. During training, a Denoising
476 Auto-Encoder, it is presented with a stochastically corrupted version 479 Auto-Encoder is presented with a stochastically corrupted version
477 of the input and trained to reconstruct the uncorrupted input, 480 of the input and trained to reconstruct the uncorrupted input,
478 forcing the hidden units to represent the leading regularities in 481 forcing the hidden units to represent the leading regularities in
479 the data. Once it is trained, its hidden units activations can 482 the data. Once it is trained, its hidden units activations can
480 be used as inputs for training a second one, etc. 483 be used as inputs for training a second one, etc.
481 After this unsupervised pre-training stage, the parameters 484 After this unsupervised pre-training stage, the parameters