Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 559:cf5a7ee2d892
Merged
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Thu, 03 Jun 2010 09:18:02 -0400 |
parents | 143a1467f157 17d16700e0c8 |
children | dc5c3f538a05 |
comparison
equal
deleted
inserted
replaced
558:143a1467f157 | 559:cf5a7ee2d892 |
---|---|
105 a corresponding shallow and purely supervised architecture? | 105 a corresponding shallow and purely supervised architecture? |
106 %\end{enumerate} | 106 %\end{enumerate} |
107 | 107 |
108 Our experimental results provide positive evidence towards all of these questions. | 108 Our experimental results provide positive evidence towards all of these questions. |
109 To achieve these results, we introduce in the next section a sophisticated system | 109 To achieve these results, we introduce in the next section a sophisticated system |
110 for stochastically transforming character images. The conclusion discusses | 110 for stochastically transforming character images and then explain the methodology. |
111 The conclusion discusses | |
111 the more general question of why deep learners may benefit so much from | 112 the more general question of why deep learners may benefit so much from |
112 the self-taught learning framework. | 113 the self-taught learning framework. |
113 | 114 |
114 \vspace*{-1mm} | 115 \vspace*{-1mm} |
115 \section{Perturbation and Transformation of Character Images} | 116 \section{Perturbation and Transformation of Character Images} |
163 \end{center} | 164 \end{center} |
164 %\vspace{.6cm} | 165 %\vspace{.6cm} |
165 %\end{minipage}% | 166 %\end{minipage}% |
166 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 167 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
167 \end{wrapfigure} | 168 \end{wrapfigure} |
168 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} | 169 To change character {\bf thickness}, morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
169 are applied. The neighborhood of each pixel is multiplied | 170 are applied. The neighborhood of each pixel is multiplied |
170 element-wise with a {\em structuring element} matrix. | 171 element-wise with a {\em structuring element} matrix. |
171 The pixel value is replaced by the maximum or the minimum of the resulting | 172 The pixel value is replaced by the maximum or the minimum of the resulting |
172 matrix, respectively for dilation or erosion. Ten different structural elements with | 173 matrix, respectively for dilation or erosion. Ten different structural elements with |
173 increasing dimensions (largest is $5\times5$) were used. For each image, | 174 increasing dimensions (largest is $5\times5$) were used. For each image, |
186 {\bf Slant} | 187 {\bf Slant} |
187 \end{minipage}% | 188 \end{minipage}% |
188 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} | 189 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} |
189 %\centering | 190 %\centering |
190 %\vspace*{-15mm} | 191 %\vspace*{-15mm} |
191 Each row of the image is shifted | 192 To produce {\bf slant}, each row of the image is shifted |
192 proportionally to its height: $shift = round(slant \times height)$. | 193 proportionally to its height: $shift = round(slant \times height)$. |
193 $slant \sim U[-complexity,complexity]$. | 194 $slant \sim U[-complexity,complexity]$. |
194 \vspace{1.5cm} | 195 \vspace{1.5cm} |
195 \end{minipage} | 196 \end{minipage} |
196 %\vspace*{-4mm} | 197 %\vspace*{-4mm} |
199 %\centering | 200 %\centering |
200 \begin{wrapfigure}[8]{l}{0.15\textwidth} | 201 \begin{wrapfigure}[8]{l}{0.15\textwidth} |
201 \vspace*{-6mm} | 202 \vspace*{-6mm} |
202 \begin{center} | 203 \begin{center} |
203 \includegraphics[scale=.4]{images/Affine_only.png}\\ | 204 \includegraphics[scale=.4]{images/Affine_only.png}\\ |
204 {\bf Affine} | 205 {\bf Affine Transformation} |
205 \end{center} | 206 \end{center} |
206 \end{wrapfigure} | 207 \end{wrapfigure} |
207 %\end{minipage}% | 208 %\end{minipage}% |
208 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 209 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
209 A $2 \times 3$ affine transform matrix (with | 210 A $2 \times 3$ {\bf affine transform} matrix (with |
210 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. | 211 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. |
211 Output pixel $(x,y)$ takes the value of input pixel | 212 Output pixel $(x,y)$ takes the value of input pixel |
212 nearest to $(ax+by+c,dx+ey+f)$, | 213 nearest to $(ax+by+c,dx+ey+f)$, |
213 producing scaling, translation, rotation and shearing. | 214 producing scaling, translation, rotation and shearing. |
214 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to | 215 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to |
232 \end{center} | 233 \end{center} |
233 \end{wrapfigure} | 234 \end{wrapfigure} |
234 %\end{minipage}% | 235 %\end{minipage}% |
235 %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth} | 236 %\hspace{-3mm}\begin{minipage}[b]{0.85\linewidth} |
236 %\vspace*{-20mm} | 237 %\vspace*{-20mm} |
237 This local elastic deformation | 238 The {\bf local elastic} deformation |
238 filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | 239 module induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, |
239 which provides more details. | 240 which provides more details. |
240 The intensity of the displacement fields is given by | 241 The intensity of the displacement fields is given by |
241 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are | 242 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are |
242 convolved with a Gaussian 2D kernel (resulting in a blur) of | 243 convolved with a Gaussian 2D kernel (resulting in a blur) of |
243 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. | 244 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. |
256 \end{center} | 257 \end{center} |
257 \end{wrapfigure} | 258 \end{wrapfigure} |
258 %\vspace{.6cm} | 259 %\vspace{.6cm} |
259 %\end{minipage}% | 260 %\end{minipage}% |
260 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 261 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
261 This is the ``Whirl and pinch'' GIMP filter with whirl was set to 0. | 262 The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0. |
262 A pinch is ``similar to projecting the image onto an elastic | 263 A pinch is ``similar to projecting the image onto an elastic |
263 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 264 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
264 For a square input image, draw a radius-$r$ disk | 265 For a square input image, draw a radius-$r$ disk |
265 around $C$. Any pixel $P$ belonging to | 266 around $C$. Any pixel $P$ belonging to |
266 that disk has its value replaced by | 267 that disk has its value replaced by |
267 the value of a ``source'' pixel in the original image, | 268 the value of a ``source'' pixel in the original image, |
268 on the line that goes through $C$ and $P$, but | 269 on the line that goes through $C$ and $P$, but |
269 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | 270 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times |
270 d_1$, where $pinch$ is a parameter to the filter. | 271 d_1$, where $pinch$ is a parameter of the filter. |
271 The actual value is given by bilinear interpolation considering the pixels | 272 The actual value is given by bilinear interpolation considering the pixels |
272 around the (non-integer) source position thus found. | 273 around the (non-integer) source position thus found. |
273 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | 274 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. |
274 %\vspace{1.5cm} | 275 %\vspace{1.5cm} |
275 %\end{minipage} | 276 %\end{minipage} |
287 \includegraphics[scale=.4]{images/Motionblur_only.png}\\ | 288 \includegraphics[scale=.4]{images/Motionblur_only.png}\\ |
288 {\bf Motion Blur} | 289 {\bf Motion Blur} |
289 \end{minipage}% | 290 \end{minipage}% |
290 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | 291 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
291 %\vspace*{.5mm} | 292 %\vspace*{.5mm} |
292 This is GIMP's ``linear motion blur'' | 293 The {\bf motion blur} module is GIMP's ``linear motion blur'', which |
293 with parameters $length$ and $angle$. The value of | 294 has parameters $length$ and $angle$. The value of |
294 a pixel in the final image is approximately the mean of the first $length$ pixels | 295 a pixel in the final image is approximately the mean of the first $length$ pixels |
295 found by moving in the $angle$ direction, | 296 found by moving in the $angle$ direction, |
296 $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | 297 $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. |
297 \vspace{5mm} | 298 \vspace{5mm} |
298 \end{minipage} | 299 \end{minipage} |
305 {\bf Occlusion} | 306 {\bf Occlusion} |
306 %\vspace{.5cm} | 307 %\vspace{.5cm} |
307 \end{minipage}% | 308 \end{minipage}% |
308 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | 309 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
309 \vspace*{-18mm} | 310 \vspace*{-18mm} |
310 Selects a random rectangle from an {\em occluder} character | 311 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character |
311 image and places it over the original {\em occluded} | 312 image and places it over the original {\em occluded} |
312 image. Pixels are combined by taking the max(occluder,occluded), | 313 image. Pixels are combined by taking the max(occluder,occluded), |
313 closer to black. The rectangle corners | 314 closer to black. The rectangle corners |
314 are sampled so that larger complexity gives larger rectangles. | 315 are sampled so that larger complexity gives larger rectangles. |
315 The destination position in the occluded image are also sampled | 316 The destination position in the occluded image are also sampled |
316 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). | 317 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). |
317 This filter is skipped with probability 60\%. | 318 This module is skipped with probability 60\%. |
318 %\vspace{7mm} | 319 %\vspace{7mm} |
319 \end{minipage} | 320 \end{minipage} |
320 | 321 |
321 \vspace*{1mm} | 322 \vspace*{1mm} |
322 | 323 |
330 \end{center} | 331 \end{center} |
331 \end{wrapfigure} | 332 \end{wrapfigure} |
332 %\vspace{.5cm} | 333 %\vspace{.5cm} |
333 %\end{minipage}% | 334 %\end{minipage}% |
334 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | 335 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} |
335 Different regions of the image are spatially smoothed by convolving | 336 With the {\bf Gaussian smoothing} module, |
337 different regions of the image are spatially smoothed by convolving | |
336 the image with a symmetric Gaussian kernel of | 338 the image with a symmetric Gaussian kernel of |
337 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 339 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
338 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 340 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
339 between $0$ and $1$. We also create a symmetric weighted averaging window, of the | 341 between $0$ and $1$. We also create a symmetric weighted averaging window, of the |
340 kernel size, with maximum value at the center. For each image we sample | 342 kernel size, with maximum value at the center. For each image we sample |
342 averaging centers between the original image and the filtered one. We | 344 averaging centers between the original image and the filtered one. We |
343 initialize to zero a mask matrix of the image size. For each selected pixel | 345 initialize to zero a mask matrix of the image size. For each selected pixel |
344 we add to the mask the averaging window centered to it. The final image is | 346 we add to the mask the averaging window centered to it. The final image is |
345 computed from the following element-wise operation: $\frac{image + filtered | 347 computed from the following element-wise operation: $\frac{image + filtered |
346 image \times mask}{mask+1}$. | 348 image \times mask}{mask+1}$. |
347 This filter is skipped with probability 75\%. | 349 This module is skipped with probability 75\%. |
348 %\end{minipage} | 350 %\end{minipage} |
349 | 351 |
350 \newpage | 352 \newpage |
351 | 353 |
352 \vspace*{-9mm} | 354 \vspace*{-9mm} |
362 \end{center} | 364 \end{center} |
363 \end{wrapfigure} | 365 \end{wrapfigure} |
364 %\end{minipage}% | 366 %\end{minipage}% |
365 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} | 367 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} |
366 %\vspace*{-20mm} | 368 %\vspace*{-20mm} |
367 This filter permutes neighbouring pixels. It first selects | 369 This module {\bf permutes neighbouring pixels}. It first selects |
368 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | 370 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then |
369 sequentially exchanged with one other in as $V4$ neighbourhood. | 371 sequentially exchanged with one other in as $V4$ neighbourhood. |
370 This filter is skipped with probability 80\%.\\ | 372 This module is skipped with probability 80\%.\\ |
371 \vspace*{1mm} | 373 \vspace*{1mm} |
372 \end{minipage} | 374 \end{minipage} |
373 | 375 |
374 \vspace{-1mm} | 376 \vspace{-3mm} |
375 | 377 |
376 \begin{minipage}[t]{\linewidth} | 378 \begin{minipage}[t]{\linewidth} |
377 \begin{wrapfigure}[7]{l}{0.15\textwidth} | 379 \begin{wrapfigure}[7]{l}{0.15\textwidth} |
378 %\vspace*{-3mm} | 380 %\vspace*{-3mm} |
379 \begin{center} | 381 \begin{center} |
385 \end{center} | 387 \end{center} |
386 \end{wrapfigure} | 388 \end{wrapfigure} |
387 %\end{minipage}% | 389 %\end{minipage}% |
388 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | 390 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} |
389 \vspace*{12mm} | 391 \vspace*{12mm} |
390 This filter simply adds, to each pixel of the image independently, a | 392 The {\bf Gaussian noise} module simply adds, to each pixel of the image independently, a |
391 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. | 393 noise $\sim Normal(0,(\frac{complexity}{10})^2)$. |
392 This filter is skipped with probability 70\%. | 394 This module is skipped with probability 70\%. |
393 %\vspace{1.1cm} | 395 %\vspace{1.1cm} |
394 \end{minipage} | 396 \end{minipage} |
395 | 397 |
396 \vspace*{1.5cm} | 398 \vspace*{1.2cm} |
397 | 399 |
398 \begin{minipage}[t]{\linewidth} | 400 \begin{minipage}[t]{\linewidth} |
399 \begin{minipage}[t]{0.14\linewidth} | 401 \begin{minipage}[t]{0.14\linewidth} |
400 \centering | 402 \centering |
401 \includegraphics[scale=.4]{images/background_other_only.png}\\ | 403 \includegraphics[scale=.4]{images/background_other_only.png}\\ |
402 {\small \bf Bg Image} | 404 {\small \bf Bg Image} |
403 \end{minipage}% | 405 \end{minipage}% |
404 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | 406 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
405 \vspace*{-18mm} | 407 \vspace*{-18mm} |
406 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | 408 Following~\citet{Larochelle-jmlr-2009}, the {\bf background image} module adds a random |
407 background image behind the letter, from a randomly chosen natural image, | 409 background image behind the letter, from a randomly chosen natural image, |
408 with contrast adjustments depending on $complexity$, to preserve | 410 with contrast adjustments depending on $complexity$, to preserve |
409 more or less of the original character image. | 411 more or less of the original character image. |
410 %\vspace{.8cm} | 412 %\vspace{.8cm} |
411 \end{minipage} | 413 \end{minipage} |
417 \includegraphics[scale=.4]{images/Poivresel_only.png}\\ | 419 \includegraphics[scale=.4]{images/Poivresel_only.png}\\ |
418 {\small \bf Salt \& Pepper} | 420 {\small \bf Salt \& Pepper} |
419 \end{minipage}% | 421 \end{minipage}% |
420 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | 422 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
421 \vspace*{-18mm} | 423 \vspace*{-18mm} |
422 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | 424 The {\bf salt and pepper noise} module adds noise $\sim U[0,1]$ to random subsets of pixels. |
423 The number of selected pixels is $0.2 \times complexity$. | 425 The number of selected pixels is $0.2 \times complexity$. |
424 This filter is skipped with probability 75\%. | 426 This module is skipped with probability 75\%. |
425 %\vspace{.9cm} | 427 %\vspace{.9cm} |
426 \end{minipage} | 428 \end{minipage} |
427 %\vspace{-.7cm} | 429 %\vspace{-.7cm} |
428 | 430 |
429 \vspace{1mm} | 431 \vspace{1mm} |
439 %\end{minipage}% | 441 %\end{minipage}% |
440 \end{center} | 442 \end{center} |
441 \end{wrapfigure} | 443 \end{wrapfigure} |
442 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | 444 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} |
443 %\vspace{.4cm} | 445 %\vspace{.4cm} |
444 The scratches module places line-like white patches on the image. The | 446 The {\bf scratches} module places line-like white patches on the image. The |
445 lines are heavily transformed images of the digit ``1'' (one), chosen | 447 lines are heavily transformed images of the digit ``1'' (one), chosen |
446 at random among 500 such 1 images, | 448 at random among 500 such 1 images, |
447 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | 449 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
448 complexity)^2$ (in degrees), using bi-cubic interpolation. | 450 complexity)^2$ (in degrees), using bi-cubic interpolation. |
449 Two passes of a grey-scale morphological erosion filter | 451 Two passes of a grey-scale morphological erosion filter |
450 are applied, reducing the width of the line | 452 are applied, reducing the width of the line |
451 by an amount controlled by $complexity$. | 453 by an amount controlled by $complexity$. |
452 This filter is skipped with probability 85\%. The probabilities | 454 This module is skipped with probability 85\%. The probabilities |
453 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). | 455 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). |
454 \end{minipage} | 456 \end{minipage} |
455 | 457 |
456 \vspace*{2mm} | 458 \vspace*{2mm} |
457 | 459 |
458 \begin{minipage}[t]{0.20\linewidth} | 460 \begin{minipage}[t]{0.25\linewidth} |
459 \centering | 461 \centering |
460 \hspace*{-7mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ | 462 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ |
461 {\bf Grey \& Contrast} | 463 {\bf Grey Level \& Contrast} |
462 \end{minipage}% | 464 \end{minipage}% |
463 \hspace{-4mm}\begin{minipage}[t]{0.82\linewidth} | 465 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} |
464 \vspace*{-18mm} | 466 t -m "\vspace*{-18mm} |
465 This filter changes the contrast by changing grey levels, and may invert the image polarity (white | 467 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white |
466 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ | 468 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ |
467 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 469 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
468 polarity is inverted with probability 50\%. | 470 polarity is inverted with probability 50\%. |
469 %\vspace{.7cm} | 471 %\vspace{.7cm} |
470 \end{minipage} | 472 \end{minipage} |
708 \label{fig:error-rates-charts} | 710 \label{fig:error-rates-charts} |
709 \vspace*{-2mm} | 711 \vspace*{-2mm} |
710 \end{figure} | 712 \end{figure} |
711 | 713 |
712 | 714 |
715 \begin{figure}[ht] | |
716 \vspace*{-3mm} | |
717 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} | |
718 \vspace*{-3mm} | |
719 \caption{Relative improvement in error rate due to self-taught learning. | |
720 Left: Improvement (or loss, when negative) | |
721 induced by out-of-distribution examples (perturbed data). | |
722 Right: Improvement (or loss, when negative) induced by multi-task | |
723 learning (training on all classes and testing only on either digits, | |
724 upper case, or lower-case). The deep learner (SDA) benefits more from | |
725 both self-taught learning scenarios, compared to the shallow MLP.} | |
726 \label{fig:improvements-charts} | |
727 \vspace*{-2mm} | |
728 \end{figure} | |
729 | |
713 \section{Experimental Results} | 730 \section{Experimental Results} |
714 \vspace*{-2mm} | 731 \vspace*{-2mm} |
715 | 732 |
716 %\vspace*{-1mm} | 733 %\vspace*{-1mm} |
717 %\subsection{SDA vs MLP vs Humans} | 734 %\subsection{SDA vs MLP vs Humans} |
736 and the 10-class (digits) task. | 753 and the 10-class (digits) task. |
737 17\% error (SDA1) or 18\% error (humans) may seem large but a large | 754 17\% error (SDA1) or 18\% error (humans) may seem large but a large |
738 majority of the errors from humans and from SDA1 are from out-of-context | 755 majority of the errors from humans and from SDA1 are from out-of-context |
739 confusions (e.g. a vertical bar can be a ``1'', an ``l'' or an ``L'', and a | 756 confusions (e.g. a vertical bar can be a ``1'', an ``l'' or an ``L'', and a |
740 ``c'' and a ``C'' are often indistinguishible). | 757 ``c'' and a ``C'' are often indistinguishible). |
741 | |
742 \begin{figure}[ht] | |
743 \vspace*{-3mm} | |
744 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/improvements_charts.pdf}}} | |
745 \vspace*{-3mm} | |
746 \caption{Relative improvement in error rate due to self-taught learning. | |
747 Left: Improvement (or loss, when negative) | |
748 induced by out-of-distribution examples (perturbed data). | |
749 Right: Improvement (or loss, when negative) induced by multi-task | |
750 learning (training on all classes and testing only on either digits, | |
751 upper case, or lower-case). The deep learner (SDA) benefits more from | |
752 both self-taught learning scenarios, compared to the shallow MLP.} | |
753 \label{fig:improvements-charts} | |
754 \vspace*{-2mm} | |
755 \end{figure} | |
756 | 758 |
757 In addition, as shown in the left of | 759 In addition, as shown in the left of |
758 Figure~\ref{fig:improvements-charts}, the relative improvement in error | 760 Figure~\ref{fig:improvements-charts}, the relative improvement in error |
759 rate brought by self-taught learning is greater for the SDA, and these | 761 rate brought by self-taught learning is greater for the SDA, and these |
760 differences with the MLP are statistically and qualitatively | 762 differences with the MLP are statistically and qualitatively |