comparison writeup/nips2010_submission.tex @ 501:5927432d8b8d

-
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 12:28:05 -0400
parents 8479bf822d0e
children 2b35a6e5ece4 e837ef6eef8c
comparison
equal deleted inserted replaced
500:8479bf822d0e 501:5927432d8b8d
51 different representation of the raw visual input. In fact, 51 different representation of the raw visual input. In fact,
52 it was found recently that the features learnt in deep architectures resemble 52 it was found recently that the features learnt in deep architectures resemble
53 those observed in the first two of these stages (in areas V1 and V2 53 those observed in the first two of these stages (in areas V1 and V2
54 of visual cortex)~\citep{HonglakL2008}, and that they become more and 54 of visual cortex)~\citep{HonglakL2008}, and that they become more and
55 more invariant to factors of variation (such as camera movement) in 55 more invariant to factors of variation (such as camera movement) in
56 higher layers~\cite{Goodfellow2009}. 56 higher layers~\citep{Goodfellow2009}.
57 Learning a hierarchy of features increases the 57 Learning a hierarchy of features increases the
58 ease and practicality of developing representations that are at once 58 ease and practicality of developing representations that are at once
59 tailored to specific tasks, yet are able to borrow statistical strength 59 tailored to specific tasks, yet are able to borrow statistical strength
60 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the 60 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the
61 feature representation can lead to higher-level (more abstract, more 61 feature representation can lead to higher-level (more abstract, more
130 amount of deformation or noise introduced. 130 amount of deformation or noise introduced.
131 131
132 There are two main parts in the pipeline. The first one, 132 There are two main parts in the pipeline. The first one,
133 from slant to pinch below, performs transformations. The second 133 from slant to pinch below, performs transformations. The second
134 part, from blur to contrast, adds different kinds of noise. 134 part, from blur to contrast, adds different kinds of noise.
135
136 \begin{figure}[h]
137 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
138 \caption{Illustration of each transformation applied alone to the same image
139 of an upper-case h (top left). First row (from left to right) : original image, slant,
140 thickness, affine transformation (translation, rotation, shear),
141 local elastic deformation; second row (from left to right) :
142 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
143 background image, salt and pepper noise, spatially Gaussian noise, scratches,
144 grey level and contrast changes.}
145 \label{fig:transfo}
146 \end{figure}
135 147
136 {\large\bf Transformations} 148 {\large\bf Transformations}
137 149
138 \vspace*{2mm} 150 \vspace*{2mm}
139 151
296 (bottom right) is used as training example.} 308 (bottom right) is used as training example.}
297 \label{fig:pipeline} 309 \label{fig:pipeline}
298 \end{figure} 310 \end{figure}
299 \fi 311 \fi
300 312
301 \begin{figure}[h]
302 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
303 \caption{Illustration of each transformation applied alone to the same image
304 of an upper-case h (top left). First row (from left to right) : original image, slant,
305 thickness, affine transformation (translation, rotation, shear),
306 local elastic deformation; second row (from left to right) :
307 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
308 background image, salt and pepper noise, spatially Gaussian noise, scratches,
309 grey level and contrast changes.}
310 \label{fig:transfo}
311 \end{figure}
312
313 313
314 \vspace*{-1mm} 314 \vspace*{-1mm}
315 \section{Experimental Setup} 315 \section{Experimental Setup}
316 \vspace*{-1mm} 316 \vspace*{-1mm}
317 317
318 Whereas much previous work on deep learning algorithms had been performed on 318 Whereas much previous work on deep learning algorithms had been performed on
319 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 319 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
320 with 60~000 examples, and variants involving 10~000 320 with 60~000 examples, and variants involving 10~000
321 examples~\cite{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want 321 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
322 to focus here on the case of much larger training sets, from 10 times to 322 to focus here on the case of much larger training sets, from 10 times to
323 to 1000 times larger. The larger datasets are obtained by first sampling from 323 to 1000 times larger. The larger datasets are obtained by first sampling from
324 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, 324 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
325 and {\bf OCR data} (scanned machine printed characters). Once a character 325 and {\bf OCR data} (scanned machine printed characters). Once a character
326 is sampled from one of these sources (chosen randomly), a pipeline of 326 is sampled from one of these sources (chosen randomly), a pipeline of
332 \vspace*{-1mm} 332 \vspace*{-1mm}
333 333
334 %\begin{itemize} 334 %\begin{itemize}
335 %\item 335 %\item
336 {\bf NIST.} 336 {\bf NIST.}
337 Our main source of characters is the NIST Special Database 19~\cite{Grother-1995}, 337 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
338 widely used for training and testing character 338 widely used for training and testing character
339 recognition systems~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}. 339 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}.
340 The dataset is composed with 8????? digits and characters (upper and lower cases), with hand checked classifications, 340 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications,
341 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 341 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
342 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. 342 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
343 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended 343 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended
344 by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} 344 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}
345 for that purpose. We randomly split the remainder into a training set and a validation set for 345 for that purpose. We randomly split the remainder into a training set and a validation set for
346 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, 346 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
347 and 82587 for testing. 347 and 82587 for testing.
348 The performances reported by previous work on that dataset mostly use only the digits. 348 The performances reported by previous work on that dataset mostly use only the digits.
349 Here we use all the classes both in the training and testing phase. This is especially 349 Here we use all the classes both in the training and testing phase. This is especially
387 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label 387 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label
388 from one of the 62 character classes. 388 from one of the 62 character classes.
389 %\begin{itemize} 389 %\begin{itemize}
390 390
391 %\item 391 %\item
392 {\bf NIST.} This is the raw NIST special database 19. 392 {\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}.
393 393
394 %\item 394 %\item
395 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources 395 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources
396 and sending them through the above transformation pipeline. 396 and sending them through the above transformation pipeline.
397 For each new example to generate, a source is selected with probability $10\%$ from the fonts, 397 For each new example to generate, a source is selected with probability $10\%$ from the fonts,