# HG changeset patch # User Xavier Glorot # Date 1275254666 14400 # Node ID 534d4ecf1bd1e00d7f0c47d8c8cf6b7e5850c698 # Parent db28764b8252a37ec7bfee656d272434b43bdec2 small desription of the font added diff -r db28764b8252 -r 534d4ecf1bd1 writeup/techreport.tex --- a/writeup/techreport.tex Sun May 30 12:06:45 2010 -0400 +++ b/writeup/techreport.tex Sun May 30 17:24:26 2010 -0400 @@ -429,6 +429,13 @@ \item {\bf Fonts} +In order to have a good variety of sources we downloaded an important number of free fonts from: {\tt http://anonymous.url.net} +%real adress {\tt http://cg.scs.carleton.ca/~luc/freefonts.html} +in addition to Windows 7's, this adds up to a total of $9817$ different fonts that we can choose uniformly. +The ttf file is either used as input of the Captcha generator (see next item) or, by producing a corresponding image, +directly as input to our models. +%Guillaume are there other details I forgot on the font selection? + \item {\bf Captchas} The Captcha data source is an adaptation of the \emph{pycaptcha} library (a python based captcha generator library) for generating characters of the same format as the NIST dataset. The core of this data source is composed with a random character @@ -442,7 +449,6 @@ \subsubsection{Data Sets} \begin{itemize} -\item {\bf NIST} \item {\bf P07} The dataset P07 is sampled with our transformation pipeline with a complexity parameter of $0.7$. For each new exemple to generate, we choose one source with the following probability: $0.1$ for the fonts,