Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 513:66a905508e34
resolved merge conflict
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Tue, 01 Jun 2010 14:05:02 -0400 |
parents | 6f042a71be23 8c2ab4f246b1 |
children | 920a38715c90 |
comparison
equal
deleted
inserted
replaced
512:6f042a71be23 | 513:66a905508e34 |
---|---|
18 \vspace*{-2mm} | 18 \vspace*{-2mm} |
19 \begin{abstract} | 19 \begin{abstract} |
20 Recent theoretical and empirical work in statistical machine learning has | 20 Recent theoretical and empirical work in statistical machine learning has |
21 demonstrated the importance of learning algorithms for deep | 21 demonstrated the importance of learning algorithms for deep |
22 architectures, i.e., function classes obtained by composing multiple | 22 architectures, i.e., function classes obtained by composing multiple |
23 non-linear transformations. Self-taught learning (exploiting unlabeled | 23 non-linear transformations. The self-taught learning (exploiting unlabeled |
24 examples or examples from other distributions) has already been applied | 24 examples or examples from other distributions) has already been applied |
25 to deep learners, but mostly to show the advantage of unlabeled | 25 to deep learners, but mostly to show the advantage of unlabeled |
26 examples. Here we explore the advantage brought by {\em out-of-distribution | 26 examples. Here we explore the advantage brought by {\em out-of-distribution |
27 examples} and show that {\em deep learners benefit more from them than a | 27 examples} and show that {\em deep learners benefit more from them than a |
28 corresponding shallow learner}, in the area | 28 corresponding shallow learner}, in the area |
72 applied here, is the Denoising | 72 applied here, is the Denoising |
73 Auto-Encoder~(DEA)~\citep{VincentPLarochelleH2008-very-small}, which | 73 Auto-Encoder~(DEA)~\citep{VincentPLarochelleH2008-very-small}, which |
74 performed similarly or better than previously proposed Restricted Boltzmann | 74 performed similarly or better than previously proposed Restricted Boltzmann |
75 Machines in terms of unsupervised extraction of a hierarchy of features | 75 Machines in terms of unsupervised extraction of a hierarchy of features |
76 useful for classification. The principle is that each layer starting from | 76 useful for classification. The principle is that each layer starting from |
77 the bottom is trained to encode its input (the output of the previous | 77 the bottom is trained to encode their input (the output of the previous |
78 layer) and to reconstruct it from a corrupted version of it. After this | 78 layer) and try to reconstruct it from a corrupted version of it. After this |
79 unsupervised initialization, the stack of denoising auto-encoders can be | 79 unsupervised initialization, the stack of denoising auto-encoders can be |
80 converted into a deep supervised feedforward neural network and fine-tuned by | 80 converted into a deep supervised feedforward neural network and fine-tuned by |
81 stochastic gradient descent. | 81 stochastic gradient descent. |
82 | 82 |
83 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles | 83 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles |
117 Similarly, does the feature learning step in deep learning algorithms benefit more | 117 Similarly, does the feature learning step in deep learning algorithms benefit more |
118 training with similar but different classes (i.e. a multi-task learning scenario) than | 118 training with similar but different classes (i.e. a multi-task learning scenario) than |
119 a corresponding shallow and purely supervised architecture? | 119 a corresponding shallow and purely supervised architecture? |
120 %\end{enumerate} | 120 %\end{enumerate} |
121 | 121 |
122 Our experimental results provide evidence to support positive answers to all of these questions. | 122 The experimental results presented here provide positive evidence towards all of these questions. |
123 | 123 |
124 \vspace*{-1mm} | 124 \vspace*{-1mm} |
125 \section{Perturbation and Transformation of Character Images} | 125 \section{Perturbation and Transformation of Character Images} |
126 \vspace*{-1mm} | 126 \vspace*{-1mm} |
127 | 127 |
202 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times | 202 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times |
203 \sqrt[3]{complexity}$.\\ | 203 \sqrt[3]{complexity}$.\\ |
204 {\bf Pinch.} | 204 {\bf Pinch.} |
205 This GIMP filter is named "Whirl and | 205 This GIMP filter is named "Whirl and |
206 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic | 206 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic |
207 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 207 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. |
208 For a square input image, think of drawing a circle of | 208 For a square input image, think of drawing a circle of |
209 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 209 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
210 that disk (region inside circle) will have its value recalculated by taking | 210 that disk (region inside circle) will have its value recalculated by taking |
211 the value of another "source" pixel in the original image. The position of | 211 the value of another "source" pixel in the original image. The position of |
212 that source pixel is found on the line that goes through $C$ and $P$, but | 212 that source pixel is found on the line that goes through $C$ and $P$, but |
336 the best SDA (again according to validation set error), along with a precise estimate | 336 the best SDA (again according to validation set error), along with a precise estimate |
337 of human performance obtained via Amazon's Mechanical Turk (AMT) | 337 of human performance obtained via Amazon's Mechanical Turk (AMT) |
338 service\footnote{http://mturk.com}. | 338 service\footnote{http://mturk.com}. |
339 AMT users are paid small amounts | 339 AMT users are paid small amounts |
340 of money to perform tasks for which human intelligence is required. | 340 of money to perform tasks for which human intelligence is required. |
341 Mechanical Turk has been used extensively in natural language processing and vision. | 341 Mechanical Turk has been used extensively in natural language |
342 %processing \citep{SnowEtAl2008} and vision | 342 processing \citep{SnowEtAl2008} and vision |
343 %\citep{SorokinAndForsyth2008,whitehill09}. | 343 \citep{SorokinAndForsyth2008,whitehill09}. |
344 %\citep{SorokinAndForsyth2008,whitehill09}. | |
345 AMT users where presented | 344 AMT users where presented |
346 with 10 character images and asked to type 10 corresponding ASCII | 345 with 10 character images and asked to type 10 corresponding ASCII |
347 characters. They were forced to make a hard choice among the | 346 characters. They were forced to make a hard choice among the |
348 62 or 10 character classes (all classes or digits only). | 347 62 or 10 character classes (all classes or digits only). |
349 Three users classified each image, allowing | 348 Three users classified each image, allowing |
585 \fi | 584 \fi |
586 | 585 |
587 | 586 |
588 \begin{figure}[h] | 587 \begin{figure}[h] |
589 \resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}\\ | 588 \resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}\\ |
590 \caption{Charts corresponding to tables 2 (left) and 3 (right), from Appendix I.} | 589 \caption{Relative improvement in error rate due to self-taught learning. |
590 Left: Improvement (or loss, when negative) | |
591 induced by out-of-distribution examples (perturbed data). | |
592 Right: Improvement (or loss, when negative) induced by multi-task | |
593 learning (training on all classes and testing only on either digits, | |
594 upper case, or lower-case). The deep learner (SDA) benefits more from | |
595 both self-taught learning scenarios, compared to the shallow MLP.} | |
591 \label{fig:improvements-charts} | 596 \label{fig:improvements-charts} |
592 \end{figure} | 597 \end{figure} |
593 | 598 |
594 \vspace*{-1mm} | 599 \vspace*{-1mm} |
595 \section{Conclusions} | 600 \section{Conclusions} |