# HG changeset patch # User Yoshua Bengio # Date 1292864075 18000 # Node ID 21d53fd07f6e057963087f1ef7c2dc7f4e8add0f # Parent 5081206fe45bb92cd6d4b2686ba408d1ca449b23 reviews AISTATS diff -r 5081206fe45b -r 21d53fd07f6e writeup/ReviewsAISTATS.html --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/writeup/ReviewsAISTATS.html Mon Dec 20 11:54:35 2010 -0500 @@ -0,0 +1,297 @@ + + + + Reviews For Paper + + + + +
+
+ +
+ + + + + + + +
  +Reviews For Paper + +
+ + + + + + + + + + + + + + + + + +
Paper ID126
TitleDeep Learners Benefit More from Out-of-Distribution Examples
+ + +
+ + + + + + + + + +
+ Masked Reviewer ID: + + Assigned_Reviewer_2 +
+ Review: + +
+
+ + + + + + + + + + + + + + + + + + +
Question 
Overall rating: please synthesize your answers to other questions into an overall recommendation. Please take into account tradeoffs (an increase in one measure may compensate for a decrease in another), and describe the tradeoffs in the detailed comments. + Good: suggest accept +
Technical quality: is all included material presented clearly and correctly? + Good +
Originality: how much new work is represented in this paper, beyond previous conference/journal papers? + Substantial new material +
Interest and significance: would the paper's goal, if completely solved, represent a substantial advance for the AISTATS community? + Significant +
Thoroughness: to what degree does the paper support its conclusions through experimental comparisons, theorems, etc.? + Thorough +
Creativity: to what degree does the paper represent a novel way of setting up a problem or an unusual approach to solving it? + Most content represents application of known ideas +
Detailed Comments + This paper shows that deep networks benefit more from out-of-distribution examples than shallower architectures on a large scale character recognition experiment. A thorough empirical validation shows that deep nets produce better discrimination (than shallower nets) when trained with distorted characters and when trained on multiple tasks. +
Although the methods used are already well established in the community, these results are significant and provide new insights on the representational power of this class of methods. +
+
Suggestions: +
- it would be interesting to compare the deep architecture and the shallow architecture for a given capacity of the model (i.e. use wider shallow net) +
- since the authors use denoising autoencoders to pre-train deep networks, they could consider to use distorted characters as noisy inputs instead of artificially set to 0 some inputs. This might help learning more representations that are more robust to distortions that are actually useful for discrimination. +
+
+ + +
+ + + + + + + + + +
+ Masked Reviewer ID: + + Assigned_Reviewer_3 +
+ Review: + +
+
+ + + + + + + + + + + + + + + + + + +
Question 
Overall rating: please synthesize your answers to other questions into an overall recommendation. Please take into account tradeoffs (an increase in one measure may compensate for a decrease in another), and describe the tradeoffs in the detailed comments. + Very good: suggest accept +
Technical quality: is all included material presented clearly and correctly? + Very good +
Originality: how much new work is represented in this paper, beyond previous conference/journal papers? + Substantial new material +
Interest and significance: would the paper's goal, if completely solved, represent a substantial advance for the AISTATS community? + Significant +
Thoroughness: to what degree does the paper support its conclusions through experimental comparisons, theorems, etc.? + Thorough +
Creativity: to what degree does the paper represent a novel way of setting up a problem or an unusual approach to solving it? + Most content represents novel approaches +
Detailed Comments + This paper claims that using out-of-distribution examples can be more helpful in training deep architectures than shallow architectures. In order to test this hypothesis, the paper develops extensive transformations for image patches (i.e., images of handwritten characters) to generate a large-scale dataset of perturbed images. These out-of-distribution examples are trained using MLPs and stacked denoising auto-encoders (SDAs). In the experiments, the paper shows that SDAs outperform MLPs, achieving human-level performance for NIST dataset. The paper also provides two interesting experiments showing that: (1) SDAs can benefit from training perturbed data, even when testing on clean data; (2) SDAs can significantly benefit from multi-task learning. +
+
+
Questions, comments, and suggestions: +
1. Regarding the human labeling, I have some concerns about labeling noise/biases due to AMT. How were the anomalies in labeling or outliers controlled? Was there any procedure to minimize labeling noise/biases or to ensure that human labelers tried their best (e.g., filtering out random guesses or encouraging the labelers to consider all possibilities carefully before providing premature guesses)? For example, multi-stage questionnaires (e.g., asking "characters/digits", "uppercase/lowercase", then choosing one out of 10 digits, or 26 characters) might significantly reduce labeling noise/biases, rather than showing 62 candidate answers simultaneously. +
+
2. It seems that the paper fixed the number of hidden layers as three. Despite good performance of the proposed architecture, it is somewhat unclear whether the benefit comes mainly from deep architecture or the use of denoising auto-encoders. +
+
Therefore, it will be more interesting to see the effect of the number of layers and other pre-training methods (e.g., RBMs or auto-encoders). This experiment will clarify where the benefit comes from (i.e., deep architecture vs. pre-training modules) and provide more insights about the results. +
+
3. The paper briefly mentioned about the use of libSVM, but it will be useful to compare against the results using online SVM (e.g., PEGASOS). +
+
4. The paper also talks about the effect of large labeled data in self-taught learning setting. To strengthen the claim, it will be helpful to show the test accuracy as a function of number of labeled examples. +
+
Overall, the paper is clearly written, and it provides interesting experiments on large scale datasets, addressing a number of interesting questions related to deep learning and multi-task learning. Furthermore, this work can provide a new large scale benchmark dataset (beyond MNIST) for deep learning and machine learning research. +
+
+
+ + +
+
+ +
+
 
+
+ + + \ No newline at end of file