comparison writeup/nips2010_submission.tex @ 480:150203d2b5c3

added number of train test and valid for NIST
author Xavier Glorot <glorotxa@iro.umontreal.ca>
date Sun, 30 May 2010 19:05:22 -0400
parents 6593e67381a3
children ce69aa9204d8
comparison
equal deleted inserted replaced
479:6593e67381a3 480:150203d2b5c3
313 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes 313 extracted from handwritten sample forms of 3600 writers. The characters are labelled by one of the 62 classes
314 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity. 314 corresponding to "0"-"9","A"-"Z" and "a"-"z". The dataset contains 8 series of different complexity.
315 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended 315 The fourth series, $hsf_4$, experimentally recognized to be the most difficult one is recommended
316 by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005} 316 by NIST as testing set and is used in our work and some previous work~\cite{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002,Milgram+al-2005}
317 for that purpose. We randomly split the remainder into a training set and a validation set for 317 for that purpose. We randomly split the remainder into a training set and a validation set for
318 model selection. The sizes of these data sets are: for training, XXX for validation, 318 model selection. The sizes of these data sets are: 651668 for training, 80000 for validation,
319 and XXX for testing. 319 and 82587 for testing.
320 The performances reported by previous work on that dataset mostly use only the digits. 320 The performances reported by previous work on that dataset mostly use only the digits.
321 Here we use all the classes both in the training and testing phase. This is especially 321 Here we use all the classes both in the training and testing phase. This is especially
322 useful to estimate the effect of a multi-task setting. 322 useful to estimate the effect of a multi-task setting.
323 Note that the distribution of the classes in the NIST training and test sets differs 323 Note that the distribution of the classes in the NIST training and test sets differs
324 substantially, with relatively many more digits in the test set, and uniform distribution 324 substantially, with relatively many more digits in the test set, and uniform distribution