Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 519:eaa595ea2402
section 3 quickpass
author | Dumitru Erhan <dumitru.erhan@gmail.com> |
---|---|
date | Tue, 01 Jun 2010 11:32:04 -0700 |
parents | 460a4e78c9a4 |
children | 18a6379999fd |
comparison
equal
deleted
inserted
replaced
518:460a4e78c9a4 | 519:eaa595ea2402 |
---|---|
307 | 307 |
308 \iffalse | 308 \iffalse |
309 \begin{figure}[h] | 309 \begin{figure}[h] |
310 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ | 310 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ |
311 \caption{Illustration of the pipeline of stochastic | 311 \caption{Illustration of the pipeline of stochastic |
312 transformations applied to the image of a lower-case t | 312 transformations applied to the image of a lower-case \emph{t} |
313 (the upper left image). Each image in the pipeline (going from | 313 (the upper left image). Each image in the pipeline (going from |
314 left to right, first top line, then bottom line) shows the result | 314 left to right, first top line, then bottom line) shows the result |
315 of applying one of the modules in the pipeline. The last image | 315 of applying one of the modules in the pipeline. The last image |
316 (bottom right) is used as training example.} | 316 (bottom right) is used as training example.} |
317 \label{fig:pipeline} | 317 \label{fig:pipeline} |
359 %\item | 359 %\item |
360 {\bf NIST.} | 360 {\bf NIST.} |
361 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, | 361 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, |
362 widely used for training and testing character | 362 widely used for training and testing character |
363 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. | 363 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}. |
364 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, | 364 The dataset is composed of 814255 digits and characters (upper and lower cases), with hand checked classifications, |
365 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes | 365 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
366 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. | 366 corresponding to ``0''-``9'',``A''-``Z'' and ``a''-``z''. The dataset contains 8 parts (partitions) of varying complexity. |
367 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended | 367 The fourth partition, $hsf_4$, experimentally recognized to be the most difficult one is the one recommended |
368 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} | 368 by NIST as a testing set and is used in our work as well as some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} |
369 for that purpose. We randomly split the remainder into a training set and a validation set for | 369 for that purpose. We randomly split the remainder into a training set and a validation set for |
370 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, | 370 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, |
371 and 82587 for testing. | 371 and 82587 for testing. |
372 The performances reported by previous work on that dataset mostly use only the digits. | 372 The performances reported by previous work on that dataset mostly use only the digits. |
373 Here we use all the classes both in the training and testing phase. This is especially | 373 Here we use all the classes both in the training and testing phase. This is especially |
374 useful to estimate the effect of a multi-task setting. | 374 useful to estimate the effect of a multi-task setting. |
375 Note that the distribution of the classes in the NIST training and test sets differs | 375 Note that the distribution of the classes in the NIST training and test sets differs |
376 substantially, with relatively many more digits in the test set, and uniform distribution | 376 substantially, with relatively many more digits in the test set, and more uniform distribution |
377 of letters in the test set, not in the training set (more like the natural distribution | 377 of letters in the test set, compared to the training set (in the latter, the letters are distributed |
378 of letters in text). | 378 more like the natural distribution of letters in text). |
379 | 379 |
380 %\item | 380 %\item |
381 {\bf Fonts.} | 381 {\bf Fonts.} |
382 In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net} | 382 In order to have a good variety of sources we downloaded an important number of free fonts from: |
383 %real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} | 383 {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} |
384 in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly. | 384 % TODO: pointless to anonymize, it's not pointing to our work |
385 Including operating system's (Windows 7) fonts, there is a total of $9817$ different fonts that we can choose uniformly from. | |
385 The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, | 386 The {\tt ttf} file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, |
386 directly as input to our models. | 387 directly as input to our models. |
387 | 388 |
388 %\item | 389 %\item |
389 {\bf Captchas.} | 390 {\bf Captchas.} |
390 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for | 391 The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for |
391 generating characters of the same format as the NIST dataset. This software is based on | 392 generating characters of the same format as the NIST dataset. This software is based on |
392 a random character class generator and various kinds of transformations similar to those described in the previous sections. | 393 a random character class generator and various kinds of transformations similar to those described in the previous sections. |
393 In order to increase the variability of the data generated, many different fonts are used for generating the characters. | 394 In order to increase the variability of the data generated, many different fonts are used for generating the characters. |
394 Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity | 395 Transformations (slant, distortions, rotation, translation) are applied to each randomly generated character with a complexity |
395 depending on the value of the complexity parameter provided by the user of the data source. Two levels of complexity are | 396 depending on the value of the complexity parameter provided by the user of the data source. |
396 allowed and can be controlled via an easy to use facade class. | 397 %Two levels of complexity are allowed and can be controlled via an easy to use facade class. %TODO: what's a facade class? |
397 | 398 |
398 %\item | 399 %\item |
399 {\bf OCR data.} | 400 {\bf OCR data.} |
400 A large set (2 million) of scanned, OCRed and manually verified machine-printed | 401 A large set (2 million) of scanned, OCRed and manually verified machine-printed |
401 characters (from various documents and books) where included as an | 402 characters (from various documents and books) where included as an |
402 additional source. This set is part of a larger corpus being collected by the Image Understanding | 403 additional source. This set is part of a larger corpus being collected by the Image Understanding |
403 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern | 404 Pattern Recognition Research group lead by Thomas Breuel at University of Kaiserslautern |
404 ({\tt http://www.iupr.com}), and which will be publicly released. | 405 ({\tt http://www.iupr.com}), and which will be publicly released. |
406 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this | |
405 %\end{itemize} | 407 %\end{itemize} |
406 | 408 |
407 \vspace*{-1mm} | 409 \vspace*{-1mm} |
408 \subsection{Data Sets} | 410 \subsection{Data Sets} |
409 \vspace*{-1mm} | 411 \vspace*{-1mm} |
442 Whereas previous work had compared deep architectures to both shallow MLPs and | 444 Whereas previous work had compared deep architectures to both shallow MLPs and |
443 SVMs, we only compared to MLPs here because of the very large datasets used | 445 SVMs, we only compared to MLPs here because of the very large datasets used |
444 (making the use of SVMs computationally inconvenient because of their quadratic | 446 (making the use of SVMs computationally inconvenient because of their quadratic |
445 scaling behavior). | 447 scaling behavior). |
446 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized | 448 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized |
447 exponentials) on the output layer for estimating P(class | image). | 449 exponentials) on the output layer for estimating$ P(class | image)$. |
448 The hyper-parameters are the following: number of hidden units, taken in | 450 The number of hidden units is taken in $\{300,500,800,1000,1500\}$. |
449 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training | 451 The optimization procedure is as follows: training |
450 examples are presented in minibatches of size 20. A constant learning | 452 examples are presented in minibatches of size 20, a constant learning |
451 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ | 453 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ |
452 through preliminary experiments, and 0.1 was selected. | 454 through preliminary experiments (measuring performance on a validation set), |
455 and $0.1$ was then selected. | |
453 | 456 |
454 {\bf Stacked Denoising Auto-Encoders (SDA).} | 457 {\bf Stacked Denoising Auto-Encoders (SDA).} |
455 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) | 458 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) |
456 can be used to initialize the weights of each layer of a deep MLP (with many hidden | 459 can be used to initialize the weights of each layer of a deep MLP (with many hidden |
457 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006} | 460 layers)~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006} |
470 % AJOUTER UNE IMAGE? | 473 % AJOUTER UNE IMAGE? |
471 these deep hierarchies of features, as it is very simple to train and | 474 these deep hierarchies of features, as it is very simple to train and |
472 teach (see tutorial and code there: {\tt http://deeplearning.net/tutorial}), | 475 teach (see tutorial and code there: {\tt http://deeplearning.net/tutorial}), |
473 provides immediate and efficient inference, and yielded results | 476 provides immediate and efficient inference, and yielded results |
474 comparable or better than RBMs in series of experiments | 477 comparable or better than RBMs in series of experiments |
475 \citep{VincentPLarochelleH2008}. During training of a Denoising | 478 \citep{VincentPLarochelleH2008}. During training, a Denoising |
476 Auto-Encoder, it is presented with a stochastically corrupted version | 479 Auto-Encoder is presented with a stochastically corrupted version |
477 of the input and trained to reconstruct the uncorrupted input, | 480 of the input and trained to reconstruct the uncorrupted input, |
478 forcing the hidden units to represent the leading regularities in | 481 forcing the hidden units to represent the leading regularities in |
479 the data. Once it is trained, its hidden units activations can | 482 the data. Once it is trained, its hidden units activations can |
480 be used as inputs for training a second one, etc. | 483 be used as inputs for training a second one, etc. |
481 After this unsupervised pre-training stage, the parameters | 484 After this unsupervised pre-training stage, the parameters |