comparison writeup/nips2010_submission.tex @ 553:8f6c09d1140f

ca fitte de nouveau
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Wed, 02 Jun 2010 17:40:43 -0400
parents 35c611363291
children e95395f51d72
comparison
equal deleted inserted replaced
552:35c611363291 553:8f6c09d1140f
131 the self-taught learning framework. 131 the self-taught learning framework.
132 132
133 \vspace*{-1mm} 133 \vspace*{-1mm}
134 \section{Perturbation and Transformation of Character Images} 134 \section{Perturbation and Transformation of Character Images}
135 \label{s:perturbations} 135 \label{s:perturbations}
136 {\large\bf Transformations}
137
138 \vspace*{-1mm} 136 \vspace*{-1mm}
139 137
140 \begin{minipage}[b]{0.14\linewidth} 138 \begin{minipage}[b]{0.14\linewidth}
141 \centering 139 \centering
142 \includegraphics[scale=.45]{images/Original.PNG} 140 \includegraphics[scale=.45]{images/Original.PNG}
143 \label{fig:Original} 141 \label{fig:Original}
144 \vspace{1.2cm} 142 \vspace{1.2cm}
145 \end{minipage}% 143 \end{minipage}%
146 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 144 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
147 {\bf Original:} 145 {\bf Original.}
148 This section describes the different transformations we used to stochastically 146 This section describes the different transformations we used to stochastically
149 transform source images in order to obtain data from a larger distribution which 147 transform source images such as the one on the left
148 in order to obtain data from a larger distribution which
150 covers a domain substantially larger than the clean characters distribution from 149 covers a domain substantially larger than the clean characters distribution from
151 which we start. Although character transformations have been used before to 150 which we start. Although character transformations have been used before to
152 improve character recognizers, this effort is on a large scale both 151 improve character recognizers, this effort is on a large scale both
153 in number of classes and in the complexity of the transformations, hence 152 in number of classes and in the complexity of the transformations, hence
154 in the complexity of the learning task. 153 in the complexity of the learning task.
156 be found in this technical report~\citep{ift6266-tr-anonymous}. 155 be found in this technical report~\citep{ift6266-tr-anonymous}.
157 The code for these transformations (mostly python) is available at 156 The code for these transformations (mostly python) is available at
158 {\tt http://anonymous.url.net}. All the modules in the pipeline share 157 {\tt http://anonymous.url.net}. All the modules in the pipeline share
159 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the 158 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the
160 amount of deformation or noise introduced. 159 amount of deformation or noise introduced.
161
162 There are two main parts in the pipeline. The first one, 160 There are two main parts in the pipeline. The first one,
163 from slant to pinch below, performs transformations. The second 161 from slant to pinch below, performs transformations. The second
164 part, from blur to contrast, adds different kinds of noise. 162 part, from blur to contrast, adds different kinds of noise.
165 \end{minipage} 163 \end{minipage}
166 164
165 {\large\bf Transformations}
166
167 167
168 \begin{minipage}[b]{0.14\linewidth} 168 \begin{minipage}[b]{0.14\linewidth}
169 \centering 169 \centering
170 \includegraphics[scale=.45]{images/Slant_only.PNG} 170 \includegraphics[scale=.45]{images/Slant_only.PNG}
171 \label{fig:Slant} 171 \label{fig:Slant}
172 \end{minipage}% 172 \end{minipage}%
173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 173 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
174 %\centering 174 %\centering
175 {\bf Slant:} 175 {\bf Slant.}
176 Each row of the image is shifted 176 Each row of the image is shifted
177 proportionally to its height: $shift = round(slant \times height)$. 177 proportionally to its height: $shift = round(slant \times height)$.
178 $slant \sim U[-complexity,complexity]$. 178 $slant \sim U[-complexity,complexity]$.
179 \vspace{1.2cm} 179 \vspace{1.2cm}
180 \end{minipage} 180 \end{minipage}
185 \includegraphics[scale=.45]{images/Thick_only.PNG} 185 \includegraphics[scale=.45]{images/Thick_only.PNG}
186 \label{fig:Think} 186 \label{fig:Think}
187 \vspace{.9cm} 187 \vspace{.9cm}
188 \end{minipage}% 188 \end{minipage}%
189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 189 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
190 {\bf Thinkness:} 190 {\bf Thinkness.}
191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} 191 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
192 are applied. The neighborhood of each pixel is multiplied 192 are applied. The neighborhood of each pixel is multiplied
193 element-wise with a {\em structuring element} matrix. 193 element-wise with a {\em structuring element} matrix.
194 The pixel value is replaced by the maximum or the minimum of the resulting 194 The pixel value is replaced by the maximum or the minimum of the resulting
195 matrix, respectively for dilation or erosion. Ten different structural elements with 195 matrix, respectively for dilation or erosion. Ten different structural elements with
208 \centering 208 \centering
209 \includegraphics[scale=.45]{images/Affine_only.PNG} 209 \includegraphics[scale=.45]{images/Affine_only.PNG}
210 \label{fig:Affine} 210 \label{fig:Affine}
211 \end{minipage}% 211 \end{minipage}%
212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 212 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
213 {\bf Affine Transformations:} 213 {\bf Affine Transformations.}
214 A $2 \times 3$ affine transform matrix (with 214 A $2 \times 3$ affine transform matrix (with
215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. 215 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
216 Output pixel $(x,y)$ takes the value of input pixel 216 Output pixel $(x,y)$ takes the value of input pixel
217 nearest to $(ax+by+c,dx+ey+f)$, 217 nearest to $(ax+by+c,dx+ey+f)$,
218 producing scaling, translation, rotation and shearing. 218 producing scaling, translation, rotation and shearing.
228 \centering 228 \centering
229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG} 229 \includegraphics[scale=.45]{images/Localelasticdistorsions_only.PNG}
230 \label{fig:Elastic} 230 \label{fig:Elastic}
231 \end{minipage}% 231 \end{minipage}%
232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 232 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
233 {\bf Local Elastic Deformations:} 233 {\bf Local Elastic Deformations.}
234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, 234 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
235 which provides more details. 235 which provides more details.
236 The intensity of the displacement fields is given by 236 The intensity of the displacement fields is given by
237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are 237 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are
238 convolved with a Gaussian 2D kernel (resulting in a blur) of 238 convolved with a Gaussian 2D kernel (resulting in a blur) of
246 \includegraphics[scale=.45]{images/Pinch_only.PNG} 246 \includegraphics[scale=.45]{images/Pinch_only.PNG}
247 \label{fig:Pinch} 247 \label{fig:Pinch}
248 \vspace{.6cm} 248 \vspace{.6cm}
249 \end{minipage}% 249 \end{minipage}%
250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 250 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
251 {\bf Pinch:} 251 {\bf Pinch.}
252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. 252 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0.
253 A pinch is ``similar to projecting the image onto an elastic 253 A pinch is ``similar to projecting the image onto an elastic
254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). 254 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
255 For a square input image, this is akin to drawing a circle of 255 For a square input image, this is akin to drawing a circle of
256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 256 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
275 \centering 275 \centering
276 \includegraphics[scale=.45]{images/Motionblur_only.PNG} 276 \includegraphics[scale=.45]{images/Motionblur_only.PNG}
277 \label{fig:Original} 277 \label{fig:Original}
278 \end{minipage}% 278 \end{minipage}%
279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 279 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
280 {\bf Motion Blur:} 280 {\bf Motion Blur.}
281 This is GIMP's ``linear motion blur'' 281 This is GIMP's ``linear motion blur''
282 with parameters $length$ and $angle$. The value of 282 with parameters $length$ and $angle$. The value of
283 a pixel in the final image is approximately the mean value of the first $length$ pixels 283 a pixel in the final image is approximately the mean value of the first $length$ pixels
284 found by moving in the $angle$ direction. 284 found by moving in the $angle$ direction.
285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. 285 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
292 \centering 292 \centering
293 \includegraphics[scale=.45]{images/occlusion_only.PNG} 293 \includegraphics[scale=.45]{images/occlusion_only.PNG}
294 \label{fig:Original} 294 \label{fig:Original}
295 \end{minipage}% 295 \end{minipage}%
296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 296 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
297 {\bf Occlusion:} 297 {\bf Occlusion.}
298 Selects a random rectangle from an {\em occluder} character 298 Selects a random rectangle from an {\em occluder} character
299 image and places it over the original {\em occluded} 299 image and places it over the original {\em occluded}
300 image. Pixels are combined by taking the max(occluder,occluded), 300 image. Pixels are combined by taking the max(occluder,occluded),
301 closer to black. The rectangle corners 301 closer to black. The rectangle corners
302 are sampled so that larger complexity gives larger rectangles. 302 are sampled so that larger complexity gives larger rectangles.
311 \centering 311 \centering
312 \includegraphics[scale=.45]{images/Permutpixel_only.PNG} 312 \includegraphics[scale=.45]{images/Permutpixel_only.PNG}
313 \label{fig:Original} 313 \label{fig:Original}
314 \end{minipage}% 314 \end{minipage}%
315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 315 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
316 {\bf Pixel Permutation:} 316 {\bf Pixel Permutation.}
317 This filter permutes neighbouring pixels. It first selects 317 This filter permutes neighbouring pixels. It first selects
318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then 318 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then
319 sequentially exchanged with one other in as $V4$ neighbourhood. 319 sequentially exchanged with one other in as $V4$ neighbourhood.
320 This filter is skipped with probability 80\%. 320 This filter is skipped with probability 80\%.
321 \vspace{.8cm} 321 \vspace{.8cm}
326 \centering 326 \centering
327 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG} 327 \includegraphics[scale=.45]{images/Distorsiongauss_only.PNG}
328 \label{fig:Original} 328 \label{fig:Original}
329 \end{minipage}% 329 \end{minipage}%
330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 330 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
331 {\bf Gaussian Noise:} 331 {\bf Gaussian Noise.}
332 This filter simply adds, to each pixel of the image independently, a 332 This filter simply adds, to each pixel of the image independently, a
333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. 333 noise $\sim Normal(0,(\frac{complexity}{10})^2)$.
334 This filter is skipped with probability 70\%. 334 This filter is skipped with probability 70\%.
335 \vspace{1.1cm} 335 \vspace{1.1cm}
336 \end{minipage} 336 \end{minipage}
340 \centering 340 \centering
341 \includegraphics[scale=.45]{images/background_other_only.png} 341 \includegraphics[scale=.45]{images/background_other_only.png}
342 \label{fig:Original} 342 \label{fig:Original}
343 \end{minipage}% 343 \end{minipage}%
344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 344 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
345 {\bf Background Images:} 345 {\bf Background Images.}
346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random 346 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
347 background behind the letter, from a randomly chosen natural image, 347 background behind the letter, from a randomly chosen natural image,
348 with contrast adjustments depending on $complexity$, to preserve 348 with contrast adjustments depending on $complexity$, to preserve
349 more or less of the original character image. 349 more or less of the original character image.
350 \vspace{.8cm} 350 \vspace{.8cm}
355 \centering 355 \centering
356 \includegraphics[scale=.45]{images/Poivresel_only.PNG} 356 \includegraphics[scale=.45]{images/Poivresel_only.PNG}
357 \label{fig:Original} 357 \label{fig:Original}
358 \end{minipage}% 358 \end{minipage}%
359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 359 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
360 {\bf Salt and Pepper Noise:} 360 {\bf Salt and Pepper Noise.}
361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. 361 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
362 The number of selected pixels is $0.2 \times complexity$. 362 The number of selected pixels is $0.2 \times complexity$.
363 This filter is skipped with probability 75\%. 363 This filter is skipped with probability 75\%.
364 \vspace{.9cm} 364 \vspace{.9cm}
365 \end{minipage} 365 \end{minipage}
370 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG} 370 \includegraphics[scale=.45]{images/Bruitgauss_only.PNG}
371 \label{fig:Original} 371 \label{fig:Original}
372 \vspace{.5cm} 372 \vspace{.5cm}
373 \end{minipage}% 373 \end{minipage}%
374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 374 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
375 {\bf Spatially Gaussian Noise:} 375 {\bf Spatially Gaussian Noise.}
376 Different regions of the image are spatially smoothed by convolving 376 Different regions of the image are spatially smoothed by convolving
377 the image is convolved with a symmetric Gaussian kernel of 377 the image is convolved with a symmetric Gaussian kernel of
378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times 378 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized 379 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
380 between $0$ and $1$. We also create a symmetric averaging window, of the 380 between $0$ and $1$. We also create a symmetric averaging window, of the
394 \includegraphics[scale=.45]{images/Rature_only.PNG} 394 \includegraphics[scale=.45]{images/Rature_only.PNG}
395 \label{fig:Original} 395 \label{fig:Original}
396 \end{minipage}% 396 \end{minipage}%
397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 397 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
398 \vspace{.4cm} 398 \vspace{.4cm}
399 {\bf Scratches:} 399 {\bf Scratches.}
400 The scratches module places line-like white patches on the image. The 400 The scratches module places line-like white patches on the image. The
401 lines are heavily transformed images of the digit ``1'' (one), chosen 401 lines are heavily transformed images of the digit ``1'' (one), chosen
402 at random among 500 such 1 images, 402 at random among 500 such 1 images,
403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times 403 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
404 complexity)^2$, using bi-cubic interpolation. 404 complexity)^2$, using bi-cubic interpolation.
414 \centering 414 \centering
415 \includegraphics[scale=.45]{images/Contrast_only.PNG} 415 \includegraphics[scale=.45]{images/Contrast_only.PNG}
416 \label{fig:Original} 416 \label{fig:Original}
417 \end{minipage}% 417 \end{minipage}%
418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 418 \hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
419 {\bf Grey Level and Contrast Changes:} 419 {\bf Grey Level and Contrast Changes.}
420 This filter changes the contrast and may invert the image polarity (white 420 This filter changes the contrast and may invert the image polarity (white
421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ 421 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 422 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
423 polarity is inverted with probability 50\%. 423 polarity is inverted with probability 50\%.
424 \vspace{.7cm} 424 \vspace{.7cm}
438 \label{fig:pipeline} 438 \label{fig:pipeline}
439 \end{figure} 439 \end{figure}
440 \fi 440 \fi
441 441
442 442
443 \vspace*{-1mm} 443 \vspace*{-2mm}
444 \section{Experimental Setup} 444 \section{Experimental Setup}
445 \vspace*{-1mm} 445 \vspace*{-1mm}
446 446
447 Whereas much previous work on deep learning algorithms had been performed on 447 Much previous work on deep learning had been performed on
448 the MNIST digits classification task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 448 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},
449 with 60~000 examples, and variants involving 10~000 449 with 60~000 examples, and variants involving 10~000
450 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}, we want 450 examples~\citep{Larochelle-jmlr-toappear-2008,VincentPLarochelleH2008}.
451 to focus here on the case of much larger training sets, from 10 times to 451 The focus here is on much larger training sets, from 10 times to
452 to 1000 times larger. 452 to 1000 times larger, and 62 classes.
453 453
454 The first step in constructing the larger datasets (called NISTP and P07) is to sample from 454 The first step in constructing the larger datasets (called NISTP and P07) is to sample from
455 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas}, 455 a {\em data source}: {\bf NIST} (NIST database 19), {\bf Fonts}, {\bf Captchas},
456 and {\bf OCR data} (scanned machine printed characters). Once a character 456 and {\bf OCR data} (scanned machine printed characters). Once a character
457 is sampled from one of these sources (chosen randomly), the second step is to 457 is sampled from one of these sources (chosen randomly), the second step is to
468 of money to perform tasks for which human intelligence is required. 468 of money to perform tasks for which human intelligence is required.
469 Mechanical Turk has been used extensively in natural language processing and vision. 469 Mechanical Turk has been used extensively in natural language processing and vision.
470 %processing \citep{SnowEtAl2008} and vision 470 %processing \citep{SnowEtAl2008} and vision
471 %\citep{SorokinAndForsyth2008,whitehill09}. 471 %\citep{SorokinAndForsyth2008,whitehill09}.
472 AMT users were presented 472 AMT users were presented
473 with 10 character images and asked to choose 10 corresponding ASCII 473 with 10 character images (from a test set) and asked to choose 10 corresponding ASCII
474 characters. They were forced to make a hard choice among the 474 characters. They were forced to make a hard choice among the
475 62 or 10 character classes (all classes or digits only). 475 62 or 10 character classes (all classes or digits only).
476 A total 2500 images/dataset were classified by XXX subjects, 476 80 subjects classified 2500 images per (dataset,task) pair,
477 with 3 subjects classifying each image, allowing 477 with the guarantee that 3 different subjects classified each image, allowing
478 us to estimate inter-human variability (e.g a standard error of 0.1\% 478 us to estimate inter-human variability (e.g a standard error of 0.1\%
479 on the average 18\% error done by humans on the 62-class task). 479 on the average 18.2\% error done by humans on the 62-class task NIST test set).
480 480
481 \vspace*{-1mm} 481 \vspace*{-3mm}
482 \subsection{Data Sources} 482 \subsection{Data Sources}
483 \vspace*{-1mm} 483 \vspace*{-2mm}
484 484
485 %\begin{itemize} 485 %\begin{itemize}
486 %\item 486 %\item
487 {\bf NIST.} 487 {\bf NIST.}
488 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995}, 488 Our main source of characters is the NIST Special Database 19~\citep{Grother-1995},
497 for that purpose. We randomly split the remainder (731668 examples) into a training set and a validation set for 497 for that purpose. We randomly split the remainder (731668 examples) into a training set and a validation set for
498 model selection. 498 model selection.
499 The performances reported by previous work on that dataset mostly use only the digits. 499 The performances reported by previous work on that dataset mostly use only the digits.
500 Here we use all the classes both in the training and testing phase. This is especially 500 Here we use all the classes both in the training and testing phase. This is especially
501 useful to estimate the effect of a multi-task setting. 501 useful to estimate the effect of a multi-task setting.
502 Note that the distribution of the classes in the NIST training and test sets differs 502 The distribution of the classes in the NIST training and test sets differs
503 substantially, with relatively many more digits in the test set, and more uniform distribution 503 substantially, with relatively many more digits in the test set, and a more uniform distribution
504 of letters in the test set, compared to the training set (in the latter, the letters are distributed 504 of letters in the test set (where the letters are distributed
505 more like the natural distribution of letters in text). 505 more like in natural text).
506 \vspace*{-1mm} 506 \vspace*{-1mm}
507 507
508 %\item 508 %\item
509 {\bf Fonts.} 509 {\bf Fonts.}
510 In order to have a good variety of sources we downloaded an important number of free fonts from: 510 In order to have a good variety of sources we downloaded an important number of free fonts from:
527 \vspace*{-1mm} 527 \vspace*{-1mm}
528 528
529 %\item 529 %\item
530 {\bf OCR data.} 530 {\bf OCR data.}
531 A large set (2 million) of scanned, OCRed and manually verified machine-printed 531 A large set (2 million) of scanned, OCRed and manually verified machine-printed
532 characters (from various documents and books) where included as an 532 characters where included as an
533 additional source. This set is part of a larger corpus being collected by the Image Understanding 533 additional source. This set is part of a larger corpus being collected by the Image Understanding
534 Pattern Recognition Research group led by Thomas Breuel at University of Kaiserslautern 534 Pattern Recognition Research group led by Thomas Breuel at University of Kaiserslautern
535 ({\tt http://www.iupr.com}), and which will be publicly released. 535 ({\tt http://www.iupr.com}), and which will be publicly released.
536 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this 536 %TODO: let's hope that Thomas is not a reviewer! :) Seriously though, maybe we should anonymize this
537 %\end{itemize} 537 %\end{itemize}
538 538
539 \vspace*{-1mm} 539 \vspace*{-3mm}
540 \subsection{Data Sets} 540 \subsection{Data Sets}
541 \vspace*{-1mm} 541 \vspace*{-2mm}
542 542
543 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label 543 All data sets contain 32$\times$32 grey-level images (values in $[0,1]$) associated with a label
544 from one of the 62 character classes. 544 from one of the 62 character classes.
545 %\begin{itemize} 545 %\begin{itemize}
546 \vspace*{-1mm} 546 \vspace*{-1mm}
566 transformed but no additional noise is added to the image, giving images 566 transformed but no additional noise is added to the image, giving images
567 closer to the NIST dataset. 567 closer to the NIST dataset.
568 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples. 568 It has \{81920000 / 80000 / 20000\} \{training / validation / test\} examples.
569 %\end{itemize} 569 %\end{itemize}
570 570
571 \vspace*{-1mm} 571 \vspace*{-3mm}
572 \subsection{Models and their Hyperparameters} 572 \subsection{Models and their Hyperparameters}
573 \vspace*{-1mm} 573 \vspace*{-2mm}
574 574
575 The experiments are performed with Multi-Layer Perceptrons (MLP) with a single 575 The experiments are performed with Multi-Layer Perceptrons (MLP) with a single
576 hidden layer and with Stacked Denoising Auto-Encoders (SDA). 576 hidden layer and with Stacked Denoising Auto-Encoders (SDA).
577 \emph{Note that all hyper-parameters are selected based on performance on the {\bf NISTP} validation set.} 577 \emph{Hyper-parameters are selected based on the {\bf NISTP} validation set error.}
578 578
579 {\bf Multi-Layer Perceptrons (MLP).} 579 {\bf Multi-Layer Perceptrons (MLP).}
580 Whereas previous work had compared deep architectures to both shallow MLPs and 580 Whereas previous work had compared deep architectures to both shallow MLPs and
581 SVMs, we only compared to MLPs here because of the very large datasets used 581 SVMs, we only compared to MLPs here because of the very large datasets used
582 (making the use of SVMs computationally challenging because of their quadratic 582 (making the use of SVMs computationally challenging because of their quadratic