comparison writeup/techreport.tex @ 434:310c730516af

added description of nist19 and captcha data sources
author goldfinger
date Mon, 03 May 2010 03:08:34 -0400
parents e2fd928a7de0
children 479f2f518fc9
comparison
equal deleted inserted replaced
433:858ee3c76497 434:310c730516af
246 246
247 \subsubsection{Data Sources} 247 \subsubsection{Data Sources}
248 248
249 \begin{itemize} 249 \begin{itemize}
250 \item {\bf NIST} 250 \item {\bf NIST}
251 The NIST Special Database 19 (NIST19) \ref{Grother} is a very widely used dataset for training and testing OCR systems. The dataset is 251 The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems.
252 composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, extracted from 252 The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications,
253 handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes corresponding to "0"-"9", 253 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
254 "A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. The fourth series, $hsf_4$, 254 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
255 experimentally recognized to be the most difficult one for classification task is recommended by NIST as testing set and is 255 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended
256 used in our work for that purpose. The performances reported by previous work on that dataset mostly use only the digits. 256 by NIST as testing set and is used in our work for that purpose.
257 The performances reported by previous work on that dataset mostly use only the digits.
257 Here we use the whole classes both in the training and testing phase. 258 Here we use the whole classes both in the training and testing phase.
258 259
259 260
260 \item {\bf Fonts} 261 \item {\bf Fonts}
261 \item {\bf Captchas} 262 \item {\bf Captchas}