Mercurial > ift6266
changeset 434:310c730516af
added description of nist19 and captcha data sources
author | goldfinger |
---|---|
date | Mon, 03 May 2010 03:08:34 -0400 |
parents | 858ee3c76497 |
children | d8129a09ffb1 |
files | writeup/techreport.tex |
diffstat | 1 files changed, 7 insertions(+), 6 deletions(-) [+] |
line wrap: on
line diff
--- a/writeup/techreport.tex Mon May 03 02:44:11 2010 -0400 +++ b/writeup/techreport.tex Mon May 03 03:08:34 2010 -0400 @@ -248,12 +248,13 @@ \begin{itemize} \item {\bf NIST} -The NIST Special Database 19 (NIST19) \ref{Grother} is a very widely used dataset for training and testing OCR systems. The dataset is -composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, extracted from -handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes corresponding to "0"-"9", -"A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. The fourth series, $hsf_4$, -experimentally recognized to be the most difficult one for classification task is recommended by NIST as testing set and is -used in our work for that purpose. The performances reported by previous work on that dataset mostly use only the digits. +The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. +The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, +extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes +corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. +The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended +by NIST as testing set and is used in our work for that purpose. +The performances reported by previous work on that dataset mostly use only the digits. Here we use the whole classes both in the training and testing phase.