Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 501:5927432d8b8d
-
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 01 Jun 2010 12:28:05 -0400 |
parents | 8479bf822d0e |
children | 2b35a6e5ece4 e837ef6eef8c |
comparison
equal
deleted
inserted
replaced
500:8479bf822d0e | 501:5927432d8b8d |
---|---|
51 different representation of the raw visual input. In fact, | 51 different representation of the raw visual input. In fact, |
52 it was found recently that the features learnt in deep architectures resemble | 52 it was found recently that the features learnt in deep architectures resemble |
53 those observed in the first two of these stages (in areas V1 and V2 | 53 those observed in the first two of these stages (in areas V1 and V2 |
54 of visual cortex)~\citep{HonglakL2008}, and that they become more and | 54 of visual cortex)~\citep{HonglakL2008}, and that they become more and |
55 more invariant to factors of variation (such as camera movement) in | 55 more invariant to factors of variation (such as camera movement) in |
56 higher layers~\cite{Goodfellow2009}. | 56 higher layers~\citep{Goodfellow2009}. |
57 Learning a hierarchy of features increases the | 57 Learning a hierarchy of features increases the |
58 ease and practicality of developing representations that are at once | 58 ease and practicality of developing representations that are at once |
59 tailored to specific tasks, yet are able to borrow statistical strength | 59 tailored to specific tasks, yet are able to borrow statistical strength |
60 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the | 60 from other related tasks (e.g., modeling different kinds of objects). Finally, learning the |
61 feature representation can lead to higher-level (more abstract, more | 61 feature representation can lead to higher-level (more abstract, more |
130 amount of deformation or noise introduced. | 130 amount of deformation or noise introduced. |
131 | 131 |
132 There are two main parts in the pipeline. The first one, | 132 There are two main parts in the pipeline. The first one, |
133 from slant to pinch below, performs transformations. The second | 133 from slant to pinch below, performs transformations. The second |
134 part, from blur to contrast, adds different kinds of noise. | 134 part, from blur to contrast, adds different kinds of noise. |
135 | |
136 \begin{figure}[h] | |
137 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\ | |
138 \caption{Illustration of each transformation applied alone to the same image | |
139 of an upper-case h (top left). First row (from left to right) : original image, slant, | |
140 thickness, affine transformation (translation, rotation, shear), | |
141 local elastic deformation; second row (from left to right) : | |
142 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : | |
143 background image, salt and pepper noise, spatially Gaussian noise, scratches, | |
144 grey level and contrast changes.} | |
145 \label{fig:transfo} | |
146 \end{figure} | |
135 | 147 |
136 {\large\bf Transformations} | 148 {\large\bf Transformations} |
137 | 149 |
138 \vspace*{2mm} | 150 \vspace*{2mm} |
139 | 151 |
296 (bottom right) is used as training example.} | 308 (bottom right) is used as training example.} |
297 \label{fig:pipeline} | 309 \label{fig:pipeline} |
298 \end{figure} | 310 \end{figure} |
299 \fi | 311 \fi |
300 | 312 |
301 \begin{figure}[h] | |
302 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\ | |
303 \caption{Illustration of each transformation applied alone to the same image | |
304 of an upper-case h (top left). First row (from left to right) : original image, slant, | |
305 thickness, affine transformation (translation, rotation, shear), | |
306 local elastic deformation; second row (from left to right) : | |
307 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : | |
308 background image, salt and pepper noise, spatially Gaussian noise, scratches, | |
309 grey level and contrast changes.} | |
310 \label{fig:transfo} | |
311 \end{figure} | |
312 | |
313 | 313 |
314 \vspace*{-1mm} | 314 \vspace*{-1mm} |
315 \section{Experimental Setup} | 315 \section{Experimental Setup} |
316 \vspace*{-1mm} | 316 \vspace*{-1mm} |
317 | 317 |
318 Whereas much previous work on deep learning algorithms had been performed on | 318 Whereas much previous work on deep learning algorithms had been performed on |
319 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, | 319 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, |
320 with 60~000 examples, and variants involving 10~000 | 320 with 60~000 examples, and variants involving 10~000 |
321 examples~\cite{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want | 321 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want |
322 to focus here on the case of much larger training sets, from 10 times to | 322 to focus here on the case of much larger training sets, from 10 times to |
323 to 1000 times larger. The larger datasets are obtained by first sampling from | 323 to 1000 times larger. The larger datasets are obtained by first sampling from |
324 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, | 324 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, |
325 and {\bf OCR data} (scanned machine printed characters). Once a character | 325 and {\bf OCR data} (scanned machine printed characters). Once a character |
326 is sampled from one of these sources (chosen randomly), a pipeline of | 326 is sampled from one of these sources (chosen randomly), a pipeline of |
332 \vspace*{-1mm} | 332 \vspace*{-1mm} |
333 | 333 |
334 %\begin{itemize} | 334 %\begin{itemize} |
335 %\item | 335 %\item |
336 {\bf NIST.} | 336 {\bf NIST.} |
337 Our main source of characters is the NIST Special Database 19~\cite{Grother-1995}, | 337 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, |
338 widely used for training and testing character | 338 widely used for training and testing character |
339 recognition systems~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}. | 339 recognition systems~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}. |
340 The dataset is composed with 8????? digits and characters (upper and lower cases), with hand checked classifications, | 340 The dataset is composed with 814255 digits and characters (upper and lower cases), with hand checked classifications, |
341 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes | 341 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
342 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. | 342 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. |
343 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended | 343 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended |
344 by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} | 344 by NIST as testing set and is used in our work and some previous work~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} |
345 for that purpose. We randomly split the remainder into a training set and a validation set for | 345 for that purpose. We randomly split the remainder into a training set and a validation set for |
346 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, | 346 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation, |
347 and 82587 for testing. | 347 and 82587 for testing. |
348 The performances reported by previous work on that dataset mostly use only the digits. | 348 The performances reported by previous work on that dataset mostly use only the digits. |
349 Here we use all the classes both in the training and testing phase. This is especially | 349 Here we use all the classes both in the training and testing phase. This is especially |
387 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label | 387 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label |
388 from one of the 62 character classes. | 388 from one of the 62 character classes. |
389 %\begin{itemize} | 389 %\begin{itemize} |
390 | 390 |
391 %\item | 391 %\item |
392 {\bf NIST.} This is the raw NIST special database 19. | 392 {\bf NIST.} This is the raw NIST special database 19~\citep{Grother-1995}. |
393 | 393 |
394 %\item | 394 %\item |
395 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources | 395 {\bf P07.} This dataset is obtained by taking raw characters from all four of the above sources |
396 and sending them through the above transformation pipeline. | 396 and sending them through the above transformation pipeline. |
397 For each new example to generate, a source is selected with probability $10\%$ from the fonts, | 397 For each new example to generate, a source is selected with probability $10\%$ from the fonts, |