diff writeup/techreport.tex @ 434:310c730516af

added description of nist19 and captcha data sources
author goldfinger
date Mon, 03 May 2010 03:08:34 -0400
parents e2fd928a7de0
children 479f2f518fc9
line wrap: on
line diff
--- a/writeup/techreport.tex	Mon May 03 02:44:11 2010 -0400
+++ b/writeup/techreport.tex	Mon May 03 03:08:34 2010 -0400
@@ -248,12 +248,13 @@
 
 \begin{itemize}
 \item {\bf NIST}
-The NIST Special Database 19 (NIST19) \ref{Grother} is a very widely used dataset for training and testing OCR systems. The dataset is 
-composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, extracted from
-handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes corresponding to "0"-"9",
-"A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. The fourth series, $hsf_4$, 
-experimentally recognized to be the most difficult one for classification task is recommended by NIST as testing set and is
-used in our work for that purpose. The performances reported by previous work on that dataset mostly use only the digits.
+The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. 
+The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications,
+extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 
+corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. 
+The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended 
+by NIST as testing set and is used in our work for that purpose.
+The performances reported by previous work on that dataset mostly use only the digits.
 Here we use the whole classes both in the training and testing phase.