ift6266: writeup/nips2010_submission.tex comparison

added number of train test and valid for NIST

author	Xavier Glorot <glorotxa@iro.umontreal.ca>
date	Sun, 30 May 2010 19:05:22 -0400
parents	6593e67381a3
children	ce69aa9204d8

comparison

equal deleted inserted replaced

-:6593e67381a3
+:150203d2b5c3
 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended
 by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}
 for that purpose. We randomly split the remainder into a training set and a validation set for
-model selection. The sizes of these data sets are:  for training, XXX for validation,
+model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
-and XXX for testing.
+and 82587 for testing.
 The performances reported by previous work on that dataset mostly use only the digits.
 Here we use all the classes both in the training and testing phase. This is especially
 useful to estimate the effect of a multi-task setting.
 Note that the distribution of the classes in the NIST training and test sets differs
 substantially, with relatively many more digits in the test set, and uniform distribution

Mercurial > ift6266