# HG changeset patch # User goldfinger # Date 1272870514 14400 # Node ID 310c730516af5517058f2d6ac12ca0a175fd29e5 # Parent 858ee3c7649784825d6b3c3b66bf6ff1ff7fa6bf added description of nist19 and captcha data sources diff -r 858ee3c76497 -r 310c730516af writeup/techreport.tex --- a/writeup/techreport.tex Mon May 03 02:44:11 2010 -0400 +++ b/writeup/techreport.tex Mon May 03 03:08:34 2010 -0400 @@ -248,12 +248,13 @@ \begin{itemize} \item {\bf NIST} -The NIST Special Database 19 (NIST19) \ref{Grother} is a very widely used dataset for training and testing OCR systems. The dataset is -composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, extracted from -handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes corresponding to "0"-"9", -"A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. The fourth series, $hsf_4$, -experimentally recognized to be the most difficult one for classification task is recommended by NIST as testing set and is -used in our work for that purpose. The performances reported by previous work on that dataset mostly use only the digits. +The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. +The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, +extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes +corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. +The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended +by NIST as testing set and is used in our work for that purpose. +The performances reported by previous work on that dataset mostly use only the digits. Here we use the whole classes both in the training and testing phase.