Mercurial > ift6266
comparison writeup/techreport.tex @ 434:310c730516af
added description of nist19 and captcha data sources
author | goldfinger |
---|---|
date | Mon, 03 May 2010 03:08:34 -0400 |
parents | e2fd928a7de0 |
children | 479f2f518fc9 |
comparison
equal
deleted
inserted
replaced
433:858ee3c76497 | 434:310c730516af |
---|---|
246 | 246 |
247 \subsubsection{Data Sources} | 247 \subsubsection{Data Sources} |
248 | 248 |
249 \begin{itemize} | 249 \begin{itemize} |
250 \item {\bf NIST} | 250 \item {\bf NIST} |
251 The NIST Special Database 19 (NIST19) \ref{Grother} is a very widely used dataset for training and testing OCR systems. The dataset is | 251 The NIST Special Database 19 (NIST19) is a very widely used dataset for training and testing OCR systems. |
252 composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, extracted from | 252 The dataset is composed with over 800 000 digits and characters (upper and lower cases), with hand checked classifications, |
253 handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes corresponding to "0"-"9", | 253 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes |
254 "A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. The fourth series, $hsf_4$, | 254 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. |
255 experimentally recognized to be the most difficult one for classification task is recommended by NIST as testing set and is | 255 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one for classification task is recommended |
256 used in our work for that purpose. The performances reported by previous work on that dataset mostly use only the digits. | 256 by NIST as testing set and is used in our work for that purpose. |
257 The performances reported by previous work on that dataset mostly use only the digits. | |
257 Here we use the whole classes both in the training and testing phase. | 258 Here we use the whole classes both in the training and testing phase. |
258 | 259 |
259 | 260 |
260 \item {\bf Fonts} | 261 \item {\bf Fonts} |
261 \item {\bf Captchas} | 262 \item {\bf Captchas} |