Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 560:dc5c3f538a05
Small fixes (typos / precisions)
author | Olivier Delalleau <delallea@iro> |
---|---|
date | Thu, 03 Jun 2010 11:02:39 -0400 |
parents | cf5a7ee2d892 |
children | b9b811e886ae |
comparison
equal
deleted
inserted
replaced
559:cf5a7ee2d892 | 560:dc5c3f538a05 |
---|---|
66 converted into a deep supervised feedforward neural network and fine-tuned by | 66 converted into a deep supervised feedforward neural network and fine-tuned by |
67 stochastic gradient descent. | 67 stochastic gradient descent. |
68 | 68 |
69 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles | 69 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles |
70 of semi-supervised and multi-task learning: the learner can exploit examples | 70 of semi-supervised and multi-task learning: the learner can exploit examples |
71 that are unlabeled and/or come from a distribution different from the target | 71 that are unlabeled and possibly come from a distribution different from the target |
72 distribution, e.g., from other classes than those of interest. | 72 distribution, e.g., from other classes than those of interest. |
73 It has already been shown that deep learners can clearly take advantage of | 73 It has already been shown that deep learners can clearly take advantage of |
74 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, | 74 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, |
75 but more needs to be done to explore the impact | 75 but more needs to be done to explore the impact |
76 of {\em out-of-distribution} examples and of the multi-task setting | 76 of {\em out-of-distribution} examples and of the multi-task setting |
127 \end{wrapfigure} | 127 \end{wrapfigure} |
128 %\vspace{0.7cm} | 128 %\vspace{0.7cm} |
129 %\end{minipage}% | 129 %\end{minipage}% |
130 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 130 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
131 This section describes the different transformations we used to stochastically | 131 This section describes the different transformations we used to stochastically |
132 transform source images such as the one on the left | 132 transform $32 \times 32$ source images (such as the one on the left) |
133 in order to obtain data from a larger distribution which | 133 in order to obtain data from a larger distribution which |
134 covers a domain substantially larger than the clean characters distribution from | 134 covers a domain substantially larger than the clean characters distribution from |
135 which we start. | 135 which we start. |
136 Although character transformations have been used before to | 136 Although character transformations have been used before to |
137 improve character recognizers, this effort is on a large scale both | 137 improve character recognizers, this effort is on a large scale both |
174 increasing dimensions (largest is $5\times5$) were used. For each image, | 174 increasing dimensions (largest is $5\times5$) were used. For each image, |
175 randomly sample the operator type (dilation or erosion) with equal probability and one structural | 175 randomly sample the operator type (dilation or erosion) with equal probability and one structural |
176 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements | 176 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements |
177 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). | 177 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). |
178 A neutral element (no transformation) | 178 A neutral element (no transformation) |
179 is always present in the set. is applied. | 179 is always present in the set. |
180 %\vspace{.4cm} | 180 %\vspace{.4cm} |
181 %\end{minipage} | 181 %\end{minipage} |
182 %\vspace{-.7cm} | 182 %\vspace{-.7cm} |
183 | 183 |
184 \begin{minipage}[b]{0.14\linewidth} | 184 \begin{minipage}[b]{0.14\linewidth} |
185 \centering | 185 \centering |
186 \includegraphics[scale=.4]{images/Slant_only.png}\\ | 186 \includegraphics[scale=.4]{images/Slant_only.png}\\ |
187 {\bf Slant} | 187 {\bf Slant} |
188 \end{minipage}% | 188 \end{minipage}% |
189 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} | 189 \hspace{0.3cm} |
190 \begin{minipage}[b]{0.83\linewidth} | |
190 %\centering | 191 %\centering |
191 %\vspace*{-15mm} | |
192 To produce {\bf slant}, each row of the image is shifted | 192 To produce {\bf slant}, each row of the image is shifted |
193 proportionally to its height: $shift = round(slant \times height)$. | 193 proportionally to its height: $shift = round(slant \times height)$. |
194 $slant \sim U[-complexity,complexity]$. | 194 $slant \sim U[-complexity,complexity]$. |
195 \vspace{1.5cm} | 195 The shift is randomly chosen to be either to the left or to the right. |
196 \vspace{1.1cm} | |
196 \end{minipage} | 197 \end{minipage} |
197 %\vspace*{-4mm} | 198 %\vspace*{-4mm} |
198 | 199 |
199 %\begin{minipage}[b]{0.14\linewidth} | 200 %\begin{minipage}[b]{0.14\linewidth} |
200 %\centering | 201 %\centering |
211 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. | 212 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. |
212 Output pixel $(x,y)$ takes the value of input pixel | 213 Output pixel $(x,y)$ takes the value of input pixel |
213 nearest to $(ax+by+c,dx+ey+f)$, | 214 nearest to $(ax+by+c,dx+ey+f)$, |
214 producing scaling, translation, rotation and shearing. | 215 producing scaling, translation, rotation and shearing. |
215 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to | 216 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to |
216 forbid large rotations (not to confuse classes) but to give good | 217 forbid large rotations (to avoid confusing classes) but to give good |
217 variability of the transformation: $a$ and $d$ $\sim U[1-3 | 218 variability of the transformation: $a$ and $d$ $\sim U[1-3 |
218 complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\, | 219 complexity,1+3\,complexity]$, $b$ and $e$ $\sim U[-3 \,complexity,3\, |
219 complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \, | 220 complexity]$, and $c$ and $f \sim U[-4 \,complexity, 4 \, |
220 complexity]$.\\ | 221 complexity]$.\\ |
221 %\end{minipage} | 222 %\end{minipage} |
222 | 223 |
223 \vspace*{-4.5mm} | 224 \vspace*{-4.5mm} |
224 | 225 |
257 \end{center} | 258 \end{center} |
258 \end{wrapfigure} | 259 \end{wrapfigure} |
259 %\vspace{.6cm} | 260 %\vspace{.6cm} |
260 %\end{minipage}% | 261 %\end{minipage}% |
261 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} | 262 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} |
262 The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0. | 263 The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl set to 0. |
263 A pinch is ``similar to projecting the image onto an elastic | 264 A pinch is ``similar to projecting the image onto an elastic |
264 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 265 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
265 For a square input image, draw a radius-$r$ disk | 266 For a square input image, draw a radius-$r$ disk |
266 around $C$. Any pixel $P$ belonging to | 267 around its center $C$. Any pixel $P$ belonging to |
267 that disk has its value replaced by | 268 that disk has its value replaced by |
268 the value of a ``source'' pixel in the original image, | 269 the value of a ``source'' pixel in the original image, |
269 on the line that goes through $C$ and $P$, but | 270 on the line that goes through $C$ and $P$, but |
270 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | 271 at some other distance $d_2$. Define $d_1=distance(P,C)$ |
272 and $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | |
271 d_1$, where $pinch$ is a parameter of the filter. | 273 d_1$, where $pinch$ is a parameter of the filter. |
272 The actual value is given by bilinear interpolation considering the pixels | 274 The actual value is given by bilinear interpolation considering the pixels |
273 around the (non-integer) source position thus found. | 275 around the (non-integer) source position thus found. |
274 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | 276 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. |
275 %\vspace{1.5cm} | 277 %\vspace{1.5cm} |
308 \end{minipage}% | 310 \end{minipage}% |
309 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} | 311 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} |
310 \vspace*{-18mm} | 312 \vspace*{-18mm} |
311 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character | 313 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character |
312 image and places it over the original {\em occluded} | 314 image and places it over the original {\em occluded} |
313 image. Pixels are combined by taking the max(occluder,occluded), | 315 image. Pixels are combined by taking the max(occluder, occluded), |
314 closer to black. The rectangle corners | 316 i.e. keeping the lighter ones. |
317 The rectangle corners | |
315 are sampled so that larger complexity gives larger rectangles. | 318 are sampled so that larger complexity gives larger rectangles. |
316 The destination position in the occluded image are also sampled | 319 The destination position in the occluded image are also sampled |
317 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). | 320 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). |
318 This module is skipped with probability 60\%. | 321 This module is skipped with probability 60\%. |
319 %\vspace{7mm} | 322 %\vspace{7mm} |
332 \end{wrapfigure} | 335 \end{wrapfigure} |
333 %\vspace{.5cm} | 336 %\vspace{.5cm} |
334 %\end{minipage}% | 337 %\end{minipage}% |
335 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} | 338 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} |
336 With the {\bf Gaussian smoothing} module, | 339 With the {\bf Gaussian smoothing} module, |
337 different regions of the image are spatially smoothed by convolving | 340 different regions of the image are spatially smoothed. |
338 the image with a symmetric Gaussian kernel of | 341 This is achieved by first convolving |
342 the image with an isotropic Gaussian kernel of | |
339 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 343 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
340 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 344 complexity]$ and $[2,2 + 6 \times complexity]$. This filtered image is normalized |
341 between $0$ and $1$. We also create a symmetric weighted averaging window, of the | 345 between $0$ and $1$. We also create an isotropic weighted averaging window, of the |
342 kernel size, with maximum value at the center. For each image we sample | 346 kernel size, with maximum value at the center. For each image we sample |
343 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be | 347 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be |
344 averaging centers between the original image and the filtered one. We | 348 averaging centers between the original image and the filtered one. We |
345 initialize to zero a mask matrix of the image size. For each selected pixel | 349 initialize to zero a mask matrix of the image size. For each selected pixel |
346 we add to the mask the averaging window centered to it. The final image is | 350 we add to the mask the averaging window centered on it. The final image is |
347 computed from the following element-wise operation: $\frac{image + filtered | 351 computed from the following element-wise operation: $\frac{image + filtered\_image |
348 image \times mask}{mask+1}$. | 352 \times mask}{mask+1}$. |
349 This module is skipped with probability 75\%. | 353 This module is skipped with probability 75\%. |
350 %\end{minipage} | 354 %\end{minipage} |
351 | 355 |
352 \newpage | 356 \newpage |
353 | 357 |
364 \end{center} | 368 \end{center} |
365 \end{wrapfigure} | 369 \end{wrapfigure} |
366 %\end{minipage}% | 370 %\end{minipage}% |
367 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} | 371 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} |
368 %\vspace*{-20mm} | 372 %\vspace*{-20mm} |
369 This module {\bf permutes neighbouring pixels}. It first selects | 373 This module {\bf permutes neighbouring pixels}. It first selects a |
370 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then | 374 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each |
371 sequentially exchanged with one other in as $V4$ neighbourhood. | 375 of these pixels is then sequentially exchanged with a random pixel |
376 among its four nearest neighbors (on its left, right, top or bottom). | |
372 This module is skipped with probability 80\%.\\ | 377 This module is skipped with probability 80\%.\\ |
373 \vspace*{1mm} | 378 \vspace*{1mm} |
374 \end{minipage} | 379 \end{minipage} |
375 | 380 |
376 \vspace{-3mm} | 381 \vspace{-3mm} |
453 by an amount controlled by $complexity$. | 458 by an amount controlled by $complexity$. |
454 This module is skipped with probability 85\%. The probabilities | 459 This module is skipped with probability 85\%. The probabilities |
455 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). | 460 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). |
456 \end{minipage} | 461 \end{minipage} |
457 | 462 |
458 \vspace*{2mm} | 463 \vspace*{1mm} |
459 | 464 |
460 \begin{minipage}[t]{0.25\linewidth} | 465 \begin{minipage}[t]{0.25\linewidth} |
461 \centering | 466 \centering |
462 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ | 467 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ |
463 {\bf Grey Level \& Contrast} | 468 {\bf Grey Level \& Contrast} |
464 \end{minipage}% | 469 \end{minipage}% |
465 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} | 470 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} |
466 t -m "\vspace*{-18mm} | 471 \vspace*{-18mm} |
467 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white | 472 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white |
468 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ | 473 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ |
469 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 474 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
470 polarity is inverted with probability 50\%. | 475 polarity is inverted with probability 50\%. |
471 %\vspace{.7cm} | 476 %\vspace{.7cm} |
484 (bottom right) is used as training example.} | 489 (bottom right) is used as training example.} |
485 \label{fig:pipeline} | 490 \label{fig:pipeline} |
486 \end{figure} | 491 \end{figure} |
487 \fi | 492 \fi |
488 | 493 |
489 | 494 \vspace*{-3mm} |
490 \vspace*{-2mm} | |
491 \section{Experimental Setup} | 495 \section{Experimental Setup} |
492 \vspace*{-1mm} | 496 \vspace*{-1mm} |
493 | 497 |
494 Much previous work on deep learning had been performed on | 498 Much previous work on deep learning had been performed on |
495 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, | 499 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, |