comparison writeup/nips2010_submission.tex @ 499:2b58eda9fc08

changements de Myriam
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Tue, 01 Jun 2010 12:12:52 -0400
parents 5764a2ae1fb5
children 8479bf822d0e
comparison
equal deleted inserted replaced
495:5764a2ae1fb5 499:2b58eda9fc08
30 human-level performance on both handwritten digit classification and 30 human-level performance on both handwritten digit classification and
31 62-class handwritten character recognition. For this purpose we 31 62-class handwritten character recognition. For this purpose we
32 developed a powerful generator of stochastic variations and noise 32 developed a powerful generator of stochastic variations and noise
33 processes character images, including not only affine transformations but 33 processes character images, including not only affine transformations but
34 also slant, local elastic deformations, changes in thickness, background 34 also slant, local elastic deformations, changes in thickness, background
35 images, color, contrast, occlusion, and various types of pixel and 35 images, grey level changes, contrast, occlusion, and various types of pixel and
36 spatially correlated noise. The out-of-distribution examples are 36 spatially correlated noise. The out-of-distribution examples are
37 obtained by training with these highly distorted images or 37 obtained by training with these highly distorted images or
38 by including object classes different from those in the target test set. 38 by including object classes different from those in the target test set.
39 \end{abstract} 39 \end{abstract}
40 \vspace*{-2mm} 40 \vspace*{-2mm}
275 This filter is only applied only 15\% of the time. When it is applied, 50\% 275 This filter is only applied only 15\% of the time. When it is applied, 50\%
276 of the time, only one patch image is generated and applied. In 30\% of 276 of the time, only one patch image is generated and applied. In 30\% of
277 cases, two patches are generated, and otherwise three patches are 277 cases, two patches are generated, and otherwise three patches are
278 generated. The patch is applied by taking the maximal value on any given 278 generated. The patch is applied by taking the maximal value on any given
279 patch or the original image, for each of the 32x32 pixel locations.\\ 279 patch or the original image, for each of the 32x32 pixel locations.\\
280 {\bf Color and Contrast Changes.} 280 {\bf Grey Level and Contrast Changes.}
281 This filter changes the contrast and may invert the image polarity (white 281 This filter changes the contrast and may invert the image polarity (white
282 on black to black on white). The contrast $C$ is defined here as the 282 on black to black on white). The contrast $C$ is defined here as the
283 difference between the maximum and the minimum pixel value of the image. 283 difference between the maximum and the minimum pixel value of the image.
284 Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$). 284 Contrast $\sim U[1-0.85 \times complexity,1]$ (so contrast $\geq 0.15$).
285 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 285 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
286 polarity is inverted with $0.5$ probability. 286 polarity is inverted with $0.5$ probability.
287 287
288 288 \iffalse
289 \begin{figure}[h] 289 \begin{figure}[h]
290 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ 290 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\
291 \caption{Illustration of the pipeline of stochastic 291 \caption{Illustration of the pipeline of stochastic
292 transformations applied to the image of a lower-case t 292 transformations applied to the image of a lower-case t
293 (the upper left image). Each image in the pipeline (going from 293 (the upper left image). Each image in the pipeline (going from
294 left to right, first top line, then bottom line) shows the result 294 left to right, first top line, then bottom line) shows the result
295 of applying one of the modules in the pipeline. The last image 295 of applying one of the modules in the pipeline. The last image
296 (bottom right) is used as training example.} 296 (bottom right) is used as training example.}
297 \label{fig:pipeline} 297 \label{fig:pipeline}
298 \end{figure} 298 \end{figure}
299 299 \fi
300 300
301 \begin{figure}[h] 301 \begin{figure}[h]
302 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\ 302 \resizebox{.99\textwidth}{!}{\includegraphics{images/transfo.png}}\\
303 \caption{Illustration of each transformation applied alone to the same image 303 \caption{Illustration of each transformation applied alone to the same image
304 of an upper-case h (top left). First row (from left to right) : original image, slant, 304 of an upper-case h (top left). First row (from left to right) : original image, slant,
305 thickness, affine transformation, local elastic deformation; second row (from left to right) : 305 thickness, affine transformation (translation, rotation, shear),
306 local elastic deformation; second row (from left to right) :
306 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : 307 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
307 background image, salt and pepper noise, spatially Gaussian noise, scratches, 308 background image, salt and pepper noise, spatially Gaussian noise, scratches,
308 color and contrast changes.} 309 grey level and contrast changes.}
309 \label{fig:transfo} 310 \label{fig:transfo}
310 \end{figure} 311 \end{figure}
311 312
312 313
313 \vspace*{-1mm} 314 \vspace*{-1mm}
318 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 319 the MNIST digits classification task~\citep{Hinton06,ranzato-07,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
319 with 60~000 examples, and variants involving 10~000 320 with 60~000 examples, and variants involving 10~000
320 examples~\cite{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want 321 examples~\cite{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want
321 to focus here on the case of much larger training sets, from 10 times to 322 to focus here on the case of much larger training sets, from 10 times to
322 to 1000 times larger. The larger datasets are obtained by first sampling from 323 to 1000 times larger. The larger datasets are obtained by first sampling from
323 a {\em data source} (NIST characters, scanned machine printed characters, characters 324 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
324 from fonts, or characters from captchas) and then optionally applying some of the 325 and {\bf OCR data} (scanned machine printed characters). Once a character
325 above transformations and/or noise processes. 326 is sampled from one of these sources (chosen randomly), a pipeline of
327 the above transformations and/or noise processes is applied to the
328 image.
326 329
327 \vspace*{-1mm} 330 \vspace*{-1mm}
328 \subsection{Data Sources} 331 \subsection{Data Sources}
329 \vspace*{-1mm} 332 \vspace*{-1mm}
330 333