comparison writeup/techreport.tex @ 463:5fa1c653620c

added small information on NISTP
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Fri, 28 May 2010 19:07:14 -0400
parents f59af1648d83
children 534d4ecf1bd1
comparison
equal deleted inserted replaced
462:f59af1648d83 463:5fa1c653620c
447 The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. 447 The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$.
448 For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts, 448 For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts,
449 $0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order 449 $0.25$ for the captchas, $0.25$ for OCR data and $0.4$ for NIST. We apply all the transformations in their order
450 and for each of them we sample uniformly a complexity in the range $[0,0.7]$. 450 and for each of them we sample uniformly a complexity in the range $[0,0.7]$.
451 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...} 451 \item {\bf NISTP} {\em ne pas utiliser PNIST mais NISTP, pour rester politically correct...}
452 NISTP is equivalent to P07 except that we only apply transformations from slant to pinch. Therefore, the character is transformed 452 NISTP is equivalent to P07 (complexity parameter of $0.7$ with the same sources proportion) except that we only apply transformations from slant to pinch. Therefore, the character is transformed
453 but no additionnal noise is added to the image, this gives images closer to the NIST dataset. 453 but no additionnal noise is added to the image, this gives images closer to the NIST dataset.
454 \end{itemize} 454 \end{itemize}
455 455
456 We noticed that the distribution of the training sets and the test sets differ. 456 We noticed that the distribution of the training sets and the test sets differ.
457 Since our validation sets are sampled from the training set, they have approximately the same distribution, but the test set has a completely different distribution as illustrated in figure \ref {setsdata}. 457 Since our validation sets are sampled from the training set, they have approximately the same distribution, but the test set has a completely different distribution as illustrated in figure \ref {setsdata}.