Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 553:8f6c09d1140f
ca fitte de nouveau
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Wed, 02 Jun 2010 17:40:43 -0400 |
parents | 35c611363291 |
children | e95395f51d72 |
comparison
equal
deleted
inserted
replaced
552:35c611363291 | 553:8f6c09d1140f |
---|---|
131 the self-taught learning framework. | 131 the self-taught learning framework. |
132 | 132 |
133 \vspace*{-1mm} | 133 \vspace*{-1mm} |
134 \section{Perturbation and Transformation of Character Images} | 134 \section{Perturbation and Transformation of Character Images} |
135 \label{s:perturbations} | 135 \label{s:perturbations} |
136 {\large\bf Transformations} | |
137 | |
138 \vspace*{-1mm} | 136 \vspace*{-1mm} |
139 | 137 |
140 \begin{minipage}[b]{0.14\linewidth} | 138 \begin{minipage}[b]{0.14\linewidth} |
141 \centering | 139 \centering |
142 \includegraphics[scale=.45]{images/Original.PNG} | 140 \includegraphics[scale=.45]{images/Original.PNG} |
143 \label{fig:Original} | 141 \label{fig:Original} |
144 \vspace{1.2cm} | 142 \vspace{1.2cm} |
145 \end{minipage}% | 143 \end{minipage}% |
146 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 144 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
147 {\bf Original:} | 145 {\bf Original.} |
148 This section describes the different transformations we used to stochastically | 146 This section describes the different transformations we used to stochastically |
149 transform source images in order to obtain data from a larger distribution which | 147 transform source images such as the one on the left |
148 in order to obtain data from a larger distribution which | |
150 covers a domain substantially larger than the clean characters distribution from | 149 covers a domain substantially larger than the clean characters distribution from |
151 which we start. Although character transformations have been used before to | 150 which we start. Although character transformations have been used before to |
152 improve character recognizers, this effort is on a large scale both | 151 improve character recognizers, this effort is on a large scale both |
153 in number of classes and in the complexity of the transformations, hence | 152 in number of classes and in the complexity of the transformations, hence |
154 in the complexity of the learning task. | 153 in the complexity of the learning task. |
156 be found in this technical report~\citep{ift6266-tr-anonymous}. | 155 be found in this technical report~\citep{ift6266-tr-anonymous}. |
157 The code for these transformations (mostly python) is available at | 156 The code for these transformations (mostly python) is available at |
158 {\tt http://anonymous.url.net}. All the modules in the pipeline share | 157 {\tt http://anonymous.url.net}. All the modules in the pipeline share |
159 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the | 158 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the |
160 amount of deformation or noise introduced. | 159 amount of deformation or noise introduced. |
161 | |
162 There are two main parts in the pipeline. The first one, | 160 There are two main parts in the pipeline. The first one, |
163 from slant to pinch below, performs transformations. The second | 161 from slant to pinch below, performs transformations. The second |
164 part, from blur to contrast, adds different kinds of noise. | 162 part, from blur to contrast, adds different kinds of noise. |
165 \end{minipage} | 163 \end{minipage} |
166 | 164 |
165 {\large\bf Transformations} | |
166 | |
167 | 167 |
168 \begin{minipage}[b]{0.14\linewidth} | 168 \begin{minipage}[b]{0.14\linewidth} |
169 \centering | 169 \centering |
170 \includegraphics[scale=.45]{images/Slant_only.PNG} | 170 \includegraphics[scale=.45]{images/Slant_only.PNG} |
171 \label{fig:Slant} | 171 \label{fig:Slant} |
172 \end{minipage}% | 172 \end{minipage}% |
173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
174 %\centering | 174 %\centering |
175 {\bf Slant:} | 175 {\bf Slant.} |
176 Each row of the image is shifted | 176 Each row of the image is shifted |
177 proportionally to its height: $shift = round(slant \times height)$. | 177 proportionally to its height: $shift = round(slant \times height)$. |
178 $slant \sim U[-complexity,complexity]$. | 178 $slant \sim U[-complexity,complexity]$. |
179 \vspace{1.2cm} | 179 \vspace{1.2cm} |
180 \end{minipage} | 180 \end{minipage} |
185 \includegraphics[scale=.45]{images/Thick_only.PNG} | 185 \includegraphics[scale=.45]{images/Thick_only.PNG} |
186 \label{fig:Think} | 186 \label{fig:Think} |
187 \vspace{.9cm} | 187 \vspace{.9cm} |
188 \end{minipage}% | 188 \end{minipage}% |
189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
190 {\bf Thinkness:} | 190 {\bf Thinkness.} |
191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} | 191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
192 are applied. The neighborhood of each pixel is multiplied | 192 are applied. The neighborhood of each pixel is multiplied |
193 element-wise with a {\em structuring element} matrix. | 193 element-wise with a {\em structuring element} matrix. |
194 The pixel value is replaced by the maximum or the minimum of the resulting | 194 The pixel value is replaced by the maximum or the minimum of the resulting |
195 matrix, respectively for dilation or erosion. Ten different structural elements with | 195 matrix, respectively for dilation or erosion. Ten different structural elements with |
208 \centering | 208 \centering |
209 \includegraphics[scale=.45]{images/Affine_only.PNG} | 209 \includegraphics[scale=.45]{images/Affine_only.PNG} |
210 \label{fig:Affine} | 210 \label{fig:Affine} |
211 \end{minipage}% | 211 \end{minipage}% |
212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
213 {\bf Affine Transformations:} | 213 {\bf Affine Transformations.} |
214 A $2 \times 3$ affine transform matrix (with | 214 A $2 \times 3$ affine transform matrix (with |
215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | 215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. |
216 Output pixel $(x,y)$ takes the value of input pixel | 216 Output pixel $(x,y)$ takes the value of input pixel |
217 nearest to $(ax+by+c,dx+ey+f)$, | 217 nearest to $(ax+by+c,dx+ey+f)$, |
218 producing scaling, translation, rotation and shearing. | 218 producing scaling, translation, rotation and shearing. |
228 \centering | 228 \centering |
229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} | 229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} |
230 \label{fig:Elastic} | 230 \label{fig:Elastic} |
231 \end{minipage}% | 231 \end{minipage}% |
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
233 {\bf Local Elastic Deformations:} | 233 {\bf Local Elastic Deformations.} |
234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | 234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, |
235 which provides more details. | 235 which provides more details. |
236 The intensity of the displacement fields is given by | 236 The intensity of the displacement fields is given by |
237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are | 237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are |
238 convolved with a Gaussian 2D kernel (resulting in a blur) of | 238 convolved with a Gaussian 2D kernel (resulting in a blur) of |
246 \includegraphics[scale=.45]{images/Pinch_only.PNG} | 246 \includegraphics[scale=.45]{images/Pinch_only.PNG} |
247 \label{fig:Pinch} | 247 \label{fig:Pinch} |
248 \vspace{.6cm} | 248 \vspace{.6cm} |
249 \end{minipage}% | 249 \end{minipage}% |
250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
251 {\bf Pinch:} | 251 {\bf Pinch.} |
252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. | 252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. |
253 A pinch is ``similar to projecting the image onto an elastic | 253 A pinch is ``similar to projecting the image onto an elastic |
254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
255 For a square input image, this is akin to drawing a circle of | 255 For a square input image, this is akin to drawing a circle of |
256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
275 \centering | 275 \centering |
276 \includegraphics[scale=.45]{images/Motionblur_only.PNG} | 276 \includegraphics[scale=.45]{images/Motionblur_only.PNG} |
277 \label{fig:Original} | 277 \label{fig:Original} |
278 \end{minipage}% | 278 \end{minipage}% |
279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
280 {\bf Motion Blur:} | 280 {\bf Motion Blur.} |
281 This is GIMP's ``linear motion blur'' | 281 This is GIMP's ``linear motion blur'' |
282 with parameters $length$ and $angle$. The value of | 282 with parameters $length$ and $angle$. The value of |
283 a pixel in the final image is approximately the mean value of the first $length$ pixels | 283 a pixel in the final image is approximately the mean value of the first $length$ pixels |
284 found by moving in the $angle$ direction. | 284 found by moving in the $angle$ direction. |
285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | 285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. |
292 \centering | 292 \centering |
293 \includegraphics[scale=.45]{images/occlusion_only.PNG} | 293 \includegraphics[scale=.45]{images/occlusion_only.PNG} |
294 \label{fig:Original} | 294 \label{fig:Original} |
295 \end{minipage}% | 295 \end{minipage}% |
296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
297 {\bf Occlusion:} | 297 {\bf Occlusion.} |
298 Selects a random rectangle from an {\em occluder} character | 298 Selects a random rectangle from an {\em occluder} character |
299 image and places it over the original {\em occluded} | 299 image and places it over the original {\em occluded} |
300 image. Pixels are combined by taking the max(occluder,occluded), | 300 image. Pixels are combined by taking the max(occluder,occluded), |
301 closer to black. The rectangle corners | 301 closer to black. The rectangle corners |
302 are sampled so that larger complexity gives larger rectangles. | 302 are sampled so that larger complexity gives larger rectangles. |
311 \centering | 311 \centering |
312 \includegraphics[scale=.45]{images/Permutpixel_only.PNG} | 312 \includegraphics[scale=.45]{images/Permutpixel_only.PNG} |
313 \label{fig:Original} | 313 \label{fig:Original} |
314 \end{minipage}% | 314 \end{minipage}% |
315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
316 {\bf Pixel Permutation:} | 316 {\bf Pixel Permutation.} |
317 This filter permutes neighbouring pixels. It first selects | 317 This filter permutes neighbouring pixels. It first selects |
318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | 318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then |
319 sequentially exchanged with one other in as $V4$ neighbourhood. | 319 sequentially exchanged with one other in as $V4$ neighbourhood. |
320 This filter is skipped with probability 80\%. | 320 This filter is skipped with probability 80\%. |
321 \vspace{.8cm} | 321 \vspace{.8cm} |
326 \centering | 326 \centering |
327 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG} | 327 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG} |
328 \label{fig:Original} | 328 \label{fig:Original} |
329 \end{minipage}% | 329 \end{minipage}% |
330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
331 {\bf Gaussian Noise:} | 331 {\bf Gaussian Noise.} |
332 This filter simply adds, to each pixel of the image independently, a | 332 This filter simply adds, to each pixel of the image independently, a |
333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. | 333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. |
334 This filter is skipped with probability 70\%. | 334 This filter is skipped with probability 70\%. |
335 \vspace{1.1cm} | 335 \vspace{1.1cm} |
336 \end{minipage} | 336 \end{minipage} |
340 \centering | 340 \centering |
341 \includegraphics[scale=.45]{images/background_other_only.png} | 341 \includegraphics[scale=.45]{images/background_other_only.png} |
342 \label{fig:Original} | 342 \label{fig:Original} |
343 \end{minipage}% | 343 \end{minipage}% |
344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
345 {\bf Background Images:} | 345 {\bf Background Images.} |
346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | 346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
347 background behind the letter, from a randomly chosen natural image, | 347 background behind the letter, from a randomly chosen natural image, |
348 with contrast adjustments depending on $complexity$, to preserve | 348 with contrast adjustments depending on $complexity$, to preserve |
349 more or less of the original character image. | 349 more or less of the original character image. |
350 \vspace{.8cm} | 350 \vspace{.8cm} |
355 \centering | 355 \centering |
356 \includegraphics[scale=.45]{images/Poivresel_only.PNG} | 356 \includegraphics[scale=.45]{images/Poivresel_only.PNG} |
357 \label{fig:Original} | 357 \label{fig:Original} |
358 \end{minipage}% | 358 \end{minipage}% |
359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
360 {\bf Salt and Pepper Noise:} | 360 {\bf Salt and Pepper Noise.} |
361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | 361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
362 The number of selected pixels is $0.2 \times complexity$. | 362 The number of selected pixels is $0.2 \times complexity$. |
363 This filter is skipped with probability 75\%. | 363 This filter is skipped with probability 75\%. |
364 \vspace{.9cm} | 364 \vspace{.9cm} |
365 \end{minipage} | 365 \end{minipage} |
370 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG} | 370 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG} |
371 \label{fig:Original} | 371 \label{fig:Original} |
372 \vspace{.5cm} | 372 \vspace{.5cm} |
373 \end{minipage}% | 373 \end{minipage}% |
374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
375 {\bf Spatially Gaussian Noise:} | 375 {\bf Spatially Gaussian Noise.} |
376 Different regions of the image are spatially smoothed by convolving | 376 Different regions of the image are spatially smoothed by convolving |
377 the image is convolved with a symmetric Gaussian kernel of | 377 the image is convolved with a symmetric Gaussian kernel of |
378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
380 between $0$ and $1$. We also create a symmetric averaging window, of the | 380 between $0$ and $1$. We also create a symmetric averaging window, of the |
394 \includegraphics[scale=.45]{images/Rature_only.PNG} | 394 \includegraphics[scale=.45]{images/Rature_only.PNG} |
395 \label{fig:Original} | 395 \label{fig:Original} |
396 \end{minipage}% | 396 \end{minipage}% |
397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
398 \vspace{.4cm} | 398 \vspace{.4cm} |
399 {\bf Scratches:} | 399 {\bf Scratches.} |
400 The scratches module places line-like white patches on the image. The | 400 The scratches module places line-like white patches on the image. The |
401 lines are heavily transformed images of the digit ``1'' (one), chosen | 401 lines are heavily transformed images of the digit ``1'' (one), chosen |
402 at random among 500 such 1 images, | 402 at random among 500 such 1 images, |
403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | 403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
404 complexity)^2$, using bi-cubic interpolation. | 404 complexity)^2$, using bi-cubic interpolation. |
414 \centering | 414 \centering |
415 \includegraphics[scale=.45]{images/Contrast_only.PNG} | 415 \includegraphics[scale=.45]{images/Contrast_only.PNG} |
416 \label{fig:Original} | 416 \label{fig:Original} |
417 \end{minipage}% | 417 \end{minipage}% |
418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
419 {\bf Grey Level and Contrast Changes:} | 419 {\bf Grey Level and Contrast Changes.} |
420 This filter changes the contrast and may invert the image polarity (white | 420 This filter changes the contrast and may invert the image polarity (white |
421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ | 421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ |
422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
423 polarity is inverted with probability 50\%. | 423 polarity is inverted with probability 50\%. |
424 \vspace{.7cm} | 424 \vspace{.7cm} |
438 \label{fig:pipeline} | 438 \label{fig:pipeline} |
439 \end{figure} | 439 \end{figure} |
440 \fi | 440 \fi |
441 | 441 |
442 | 442 |
443 \vspace*{-1mm} | 443 \vspace*{-2mm} |
444 \section{Experimental Setup} | 444 \section{Experimental Setup} |
445 \vspace*{-1mm} | 445 \vspace*{-1mm} |
446 | 446 |
447 Whereas much previous work on deep learning algorithms had been performed on | 447 Much previous work on deep learning had been performed on |
448 the MNIST digits classification task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, | 448 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, |
449 with 60~000 examples, and variants involving 10~000 | 449 with 60~000 examples, and variants involving 10~000 |
450 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want | 450 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}. |
451 to focus here on the case of much larger training sets, from 10 times to | 451 The focus here is on much larger training sets, from 10 times to |
452 to 1000 times larger. | 452 to 1000 times larger, and 62 classes. |
453 | 453 |
454 The first step in constructing the larger datasets (called NISTP and P07) is to sample from | 454 The first step in constructing the larger datasets (called NISTP and P07) is to sample from |
455 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, | 455 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, |
456 and {\bf OCR data} (scanned machine printed characters). Once a character | 456 and {\bf OCR data} (scanned machine printed characters). Once a character |
457 is sampled from one of these sources (chosen randomly), the second step is to | 457 is sampled from one of these sources (chosen randomly), the second step is to |
468 of money to perform tasks for which human intelligence is required. | 468 of money to perform tasks for which human intelligence is required. |
469 Mechanical Turk has been used extensively in natural language processing and vision. | 469 Mechanical Turk has been used extensively in natural language processing and vision. |
470 %processing \citep{SnowEtAl2008} and vision | 470 %processing \citep{SnowEtAl2008} and vision |
471 %\citep{SorokinAndForsyth2008,whitehill09}. | 471 %\citep{SorokinAndForsyth2008,whitehill09}. |
472 AMT users were presented | 472 AMT users were presented |
473 with 10 character images and asked to choose 10 corresponding ASCII | 473 with 10 character images (from a test set) and asked to choose 10 corresponding ASCII |
474 characters. They were forced to make a hard choice among the | 474 characters. They were forced to make a hard choice among the |
475 62 or 10 character classes (all classes or digits only). | 475 62 or 10 character classes (all classes or digits only). |
476 A total 2500 images/dataset were classified by XXX subjects, | 476 80 subjects classified 2500 images per (dataset,task) pair, |
477 with 3 subjects classifying each image, allowing | 477 with the guarantee that 3 different subjects classified each image, allowing |
478 us to estimate inter-human variability (e.g a standard error of 0.1\% | 478 us to estimate inter-human variability (e.g a standard error of 0.1\% |
479 on the average 18\% error done by humans on the 62-class task). | 479 on the average 18.2\% error done by humans on the 62-class task NIST test set). |
480 | 480 |
481 \vspace*{-1mm} | 481 \vspace*{-3mm} |
482 \subsection{Data Sources} | 482 \subsection{Data Sources} |
483 \vspace*{-1mm} | 483 \vspace*{-2mm} |
484 | 484 |
485 %\begin{itemize} | 485 %\begin{itemize} |
486 %\item | 486 %\item |
487 {\bf NIST.} | 487 {\bf NIST.} |
488 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, | 488 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, |
497 for that purpose. We randomly split the remainder (731668 examples) into a training set and a validation set for | 497 for that purpose. We randomly split the remainder (731668 examples) into a training set and a validation set for |
498 model selection. | 498 model selection. |
499 The performances reported by previous work on that dataset mostly use only the digits. | 499 The performances reported by previous work on that dataset mostly use only the digits. |
500 Here we use all the classes both in the training and testing phase. This is especially | 500 Here we use all the classes both in the training and testing phase. This is especially |
501 useful to estimate the effect of a multi-task setting. | 501 useful to estimate the effect of a multi-task setting. |
502 Note that the distribution of the classes in the NIST training and test sets differs | 502 The distribution of the classes in the NIST training and test sets differs |
503 substantially, with relatively many more digits in the test set, and more uniform distribution | 503 substantially, with relatively many more digits in the test set, and a more uniform distribution |
504 of letters in the test set, compared to the training set (in the latter, the letters are distributed | 504 of letters in the test set (where the letters are distributed |
505 more like the natural distribution of letters in text). | 505 more like in natural text). |
506 \vspace*{-1mm} | 506 \vspace*{-1mm} |
507 | 507 |
508 %\item | 508 %\item |
509 {\bf Fonts.} | 509 {\bf Fonts.} |
510 In order to have a good variety of sources we downloaded an important number of free fonts from: | 510 In order to have a good variety of sources we downloaded an important number of free fonts from: |
527 \vspace*{-1mm} | 527 \vspace*{-1mm} |
528 | 528 |
529 %\item | 529 %\item |
530 {\bf OCR data.} | 530 {\bf OCR data.} |
531 A large set (2 million) of scanned, OCRed and manually verified machine-printed | 531 A large set (2 million) of scanned, OCRed and manually verified machine-printed |
532 characters (from various documents and books) where included as an | 532 characters where included as an |
533 additional source. This set is part of a larger corpus being collected by the Image Understanding | 533 additional source. This set is part of a larger corpus being collected by the Image Understanding |
534 Pattern Recognition Research group led by Thomas Breuel at University of Kaiserslautern | 534 Pattern Recognition Research group led by Thomas Breuel at University of Kaiserslautern |
535 ({\tt http://www.iupr.com}), and which will be publicly released. | 535 ({\tt http://www.iupr.com}), and which will be publicly released. |
536 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this | 536 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this |
537 %\end{itemize} | 537 %\end{itemize} |
538 | 538 |
539 \vspace*{-1mm} | 539 \vspace*{-3mm} |
540 \subsection{Data Sets} | 540 \subsection{Data Sets} |
541 \vspace*{-1mm} | 541 \vspace*{-2mm} |
542 | 542 |
543 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label | 543 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label |
544 from one of the 62 character classes. | 544 from one of the 62 character classes. |
545 %\begin{itemize} | 545 %\begin{itemize} |
546 \vspace*{-1mm} | 546 \vspace*{-1mm} |
566 transformed but no additional noise is added to the image, giving images | 566 transformed but no additional noise is added to the image, giving images |
567 closer to the NIST dataset. | 567 closer to the NIST dataset. |
568 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples. | 568 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples. |
569 %\end{itemize} | 569 %\end{itemize} |
570 | 570 |
571 \vspace*{-1mm} | 571 \vspace*{-3mm} |
572 \subsection{Models and their Hyperparameters} | 572 \subsection{Models and their Hyperparameters} |
573 \vspace*{-1mm} | 573 \vspace*{-2mm} |
574 | 574 |
575 The experiments are performed with Multi-Layer Perceptrons (MLP) with a single | 575 The experiments are performed with Multi-Layer Perceptrons (MLP) with a single |
576 hidden layer and with Stacked Denoising Auto-Encoders (SDA). | 576 hidden layer and with Stacked Denoising Auto-Encoders (SDA). |
577 \emph{Note that all hyper-parameters are selected based on performance on the {\bf NISTP} validation set.} | 577 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.} |
578 | 578 |
579 {\bf Multi-Layer Perceptrons (MLP).} | 579 {\bf Multi-Layer Perceptrons (MLP).} |
580 Whereas previous work had compared deep architectures to both shallow MLPs and | 580 Whereas previous work had compared deep architectures to both shallow MLPs and |
581 SVMs, we only compared to MLPs here because of the very large datasets used | 581 SVMs, we only compared to MLPs here because of the very large datasets used |
582 (making the use of SVMs computationally challenging because of their quadratic | 582 (making the use of SVMs computationally challenging because of their quadratic |