comparison writeup/nips2010_submission.tex @ 560:dc5c3f538a05

Small fixes (typos / precisions)
author Olivier Delalleau <delallea@iro>
date Thu, 03 Jun 2010 11:02:39 -0400
parents cf5a7ee2d892
children b9b811e886ae
comparison
equal deleted inserted replaced
559:cf5a7ee2d892 560:dc5c3f538a05
66 converted into a deep supervised feedforward neural network and fine-tuned by 66 converted into a deep supervised feedforward neural network and fine-tuned by
67 stochastic gradient descent. 67 stochastic gradient descent.
68 68
69 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles 69 Self-taught learning~\citep{RainaR2007} is a paradigm that combines principles
70 of semi-supervised and multi-task learning: the learner can exploit examples 70 of semi-supervised and multi-task learning: the learner can exploit examples
71 that are unlabeled and/or come from a distribution different from the target 71 that are unlabeled and possibly come from a distribution different from the target
72 distribution, e.g., from other classes than those of interest. 72 distribution, e.g., from other classes than those of interest.
73 It has already been shown that deep learners can clearly take advantage of 73 It has already been shown that deep learners can clearly take advantage of
74 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small}, 74 unsupervised learning and unlabeled examples~\citep{Bengio-2009,WestonJ2008-small},
75 but more needs to be done to explore the impact 75 but more needs to be done to explore the impact
76 of {\em out-of-distribution} examples and of the multi-task setting 76 of {\em out-of-distribution} examples and of the multi-task setting
127 \end{wrapfigure} 127 \end{wrapfigure}
128 %\vspace{0.7cm} 128 %\vspace{0.7cm}
129 %\end{minipage}% 129 %\end{minipage}%
130 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 130 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
131 This section describes the different transformations we used to stochastically 131 This section describes the different transformations we used to stochastically
132 transform source images such as the one on the left 132 transform $32 \times 32$ source images (such as the one on the left)
133 in order to obtain data from a larger distribution which 133 in order to obtain data from a larger distribution which
134 covers a domain substantially larger than the clean characters distribution from 134 covers a domain substantially larger than the clean characters distribution from
135 which we start. 135 which we start.
136 Although character transformations have been used before to 136 Although character transformations have been used before to
137 improve character recognizers, this effort is on a large scale both 137 improve character recognizers, this effort is on a large scale both
174 increasing dimensions (largest is $5\times5$) were used. For each image, 174 increasing dimensions (largest is $5\times5$) were used. For each image,
175 randomly sample the operator type (dilation or erosion) with equal probability and one structural 175 randomly sample the operator type (dilation or erosion) with equal probability and one structural
176 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements 176 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
177 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). 177 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
178 A neutral element (no transformation) 178 A neutral element (no transformation)
179 is always present in the set. is applied. 179 is always present in the set.
180 %\vspace{.4cm} 180 %\vspace{.4cm}
181 %\end{minipage} 181 %\end{minipage}
182 %\vspace{-.7cm} 182 %\vspace{-.7cm}
183 183
184 \begin{minipage}[b]{0.14\linewidth} 184 \begin{minipage}[b]{0.14\linewidth}
185 \centering 185 \centering
186 \includegraphics[scale=.4]{images/Slant_only.png}\\ 186 \includegraphics[scale=.4]{images/Slant_only.png}\\
187 {\bf Slant} 187 {\bf Slant}
188 \end{minipage}% 188 \end{minipage}%
189 \hspace{0.3cm}\begin{minipage}[b]{0.83\linewidth} 189 \hspace{0.3cm}
190 \begin{minipage}[b]{0.83\linewidth}
190 %\centering 191 %\centering
191 %\vspace*{-15mm}
192 To produce {\bf slant}, each row of the image is shifted 192 To produce {\bf slant}, each row of the image is shifted
193 proportionally to its height: $shift = round(slant \times height)$. 193 proportionally to its height: $shift = round(slant \times height)$.
194 $slant \sim U[-complexity,complexity]$. 194 $slant \sim U[-complexity,complexity]$.
195 \vspace{1.5cm} 195 The shift is randomly chosen to be either to the left or to the right.
196 \vspace{1.1cm}
196 \end{minipage} 197 \end{minipage}
197 %\vspace*{-4mm} 198 %\vspace*{-4mm}
198 199
199 %\begin{minipage}[b]{0.14\linewidth} 200 %\begin{minipage}[b]{0.14\linewidth}
200 %\centering 201 %\centering
211 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$. 212 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$.
212 Output pixel $(x,y)$ takes the value of input pixel 213 Output pixel $(x,y)$ takes the value of input pixel
213 nearest to $(ax+by+c,dx+ey+f)$, 214 nearest to $(ax+by+c,dx+ey+f)$,
214 producing scaling, translation, rotation and shearing. 215 producing scaling, translation, rotation and shearing.
215 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to 216 Marginal distributions of $(a,b,c,d,e,f)$ have been tuned to
216 forbid large rotations (not to confuse classes) but to give good 217 forbid large rotations (to avoid confusing classes) but to give good
217 variability of the transformation: $a$ and $d$ $\sim U[1-3 218 variability of the transformation: $a$ and $d$ $\sim U[1-3
218 complexity,1+3\,complexity]$, $b$ and $e$ $\sim[-3 \,complexity,3\, 219 complexity,1+3\,complexity]$, $b$ and $e$ $\sim U[-3 \,complexity,3\,
219 complexity]$ and $c$ and $f$ $\sim U[-4 \,complexity, 4 \, 220 complexity]$, and $c$ and $f \sim U[-4 \,complexity, 4 \,
220 complexity]$.\\ 221 complexity]$.\\
221 %\end{minipage} 222 %\end{minipage}
222 223
223 \vspace*{-4.5mm} 224 \vspace*{-4.5mm}
224 225
257 \end{center} 258 \end{center}
258 \end{wrapfigure} 259 \end{wrapfigure}
259 %\vspace{.6cm} 260 %\vspace{.6cm}
260 %\end{minipage}% 261 %\end{minipage}%
261 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth} 262 %\hspace{0.3cm}\begin{minipage}[b]{0.86\linewidth}
262 The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl was set to 0. 263 The {\bf pinch} module applies the ``Whirl and pinch'' GIMP filter with whirl set to 0.
263 A pinch is ``similar to projecting the image onto an elastic 264 A pinch is ``similar to projecting the image onto an elastic
264 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). 265 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
265 For a square input image, draw a radius-$r$ disk 266 For a square input image, draw a radius-$r$ disk
266 around $C$. Any pixel $P$ belonging to 267 around its center $C$. Any pixel $P$ belonging to
267 that disk has its value replaced by 268 that disk has its value replaced by
268 the value of a ``source'' pixel in the original image, 269 the value of a ``source'' pixel in the original image,
269 on the line that goes through $C$ and $P$, but 270 on the line that goes through $C$ and $P$, but
270 at some other distance $d_2$. Define $d_1=distance(P,C) = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times 271 at some other distance $d_2$. Define $d_1=distance(P,C)$
272 and $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times
271 d_1$, where $pinch$ is a parameter of the filter. 273 d_1$, where $pinch$ is a parameter of the filter.
272 The actual value is given by bilinear interpolation considering the pixels 274 The actual value is given by bilinear interpolation considering the pixels
273 around the (non-integer) source position thus found. 275 around the (non-integer) source position thus found.
274 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. 276 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
275 %\vspace{1.5cm} 277 %\vspace{1.5cm}
308 \end{minipage}% 310 \end{minipage}%
309 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth} 311 \hspace{0.3cm}\begin{minipage}[t]{0.83\linewidth}
310 \vspace*{-18mm} 312 \vspace*{-18mm}
311 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character 313 The {\bf occlusion} module selects a random rectangle from an {\em occluder} character
312 image and places it over the original {\em occluded} 314 image and places it over the original {\em occluded}
313 image. Pixels are combined by taking the max(occluder,occluded), 315 image. Pixels are combined by taking the max(occluder, occluded),
314 closer to black. The rectangle corners 316 i.e. keeping the lighter ones.
317 The rectangle corners
315 are sampled so that larger complexity gives larger rectangles. 318 are sampled so that larger complexity gives larger rectangles.
316 The destination position in the occluded image are also sampled 319 The destination position in the occluded image are also sampled
317 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}). 320 according to a normal distribution (more details in~\citet{ift6266-tr-anonymous}).
318 This module is skipped with probability 60\%. 321 This module is skipped with probability 60\%.
319 %\vspace{7mm} 322 %\vspace{7mm}
332 \end{wrapfigure} 335 \end{wrapfigure}
333 %\vspace{.5cm} 336 %\vspace{.5cm}
334 %\end{minipage}% 337 %\end{minipage}%
335 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth} 338 %\hspace{0.3cm}\begin{minipage}[t]{0.86\linewidth}
336 With the {\bf Gaussian smoothing} module, 339 With the {\bf Gaussian smoothing} module,
337 different regions of the image are spatially smoothed by convolving 340 different regions of the image are spatially smoothed.
338 the image with a symmetric Gaussian kernel of 341 This is achieved by first convolving
342 the image with an isotropic Gaussian kernel of
339 size and variance chosen uniformly in the ranges $[12,12 + 20 \times 343 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
340 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized 344 complexity]$ and $[2,2 + 6 \times complexity]$. This filtered image is normalized
341 between $0$ and $1$. We also create a symmetric weighted averaging window, of the 345 between $0$ and $1$. We also create an isotropic weighted averaging window, of the
342 kernel size, with maximum value at the center. For each image we sample 346 kernel size, with maximum value at the center. For each image we sample
343 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be 347 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be
344 averaging centers between the original image and the filtered one. We 348 averaging centers between the original image and the filtered one. We
345 initialize to zero a mask matrix of the image size. For each selected pixel 349 initialize to zero a mask matrix of the image size. For each selected pixel
346 we add to the mask the averaging window centered to it. The final image is 350 we add to the mask the averaging window centered on it. The final image is
347 computed from the following element-wise operation: $\frac{image + filtered 351 computed from the following element-wise operation: $\frac{image + filtered\_image
348 image \times mask}{mask+1}$. 352 \times mask}{mask+1}$.
349 This module is skipped with probability 75\%. 353 This module is skipped with probability 75\%.
350 %\end{minipage} 354 %\end{minipage}
351 355
352 \newpage 356 \newpage
353 357
364 \end{center} 368 \end{center}
365 \end{wrapfigure} 369 \end{wrapfigure}
366 %\end{minipage}% 370 %\end{minipage}%
367 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth} 371 %\hspace{-0cm}\begin{minipage}[t]{0.86\linewidth}
368 %\vspace*{-20mm} 372 %\vspace*{-20mm}
369 This module {\bf permutes neighbouring pixels}. It first selects 373 This module {\bf permutes neighbouring pixels}. It first selects a
370 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each of them are then 374 fraction $\frac{complexity}{3}$ of pixels randomly in the image. Each
371 sequentially exchanged with one other in as $V4$ neighbourhood. 375 of these pixels is then sequentially exchanged with a random pixel
376 among its four nearest neighbors (on its left, right, top or bottom).
372 This module is skipped with probability 80\%.\\ 377 This module is skipped with probability 80\%.\\
373 \vspace*{1mm} 378 \vspace*{1mm}
374 \end{minipage} 379 \end{minipage}
375 380
376 \vspace{-3mm} 381 \vspace{-3mm}
453 by an amount controlled by $complexity$. 458 by an amount controlled by $complexity$.
454 This module is skipped with probability 85\%. The probabilities 459 This module is skipped with probability 85\%. The probabilities
455 of applying 1, 2, or 3 patches are (50\%,30\%,20\%). 460 of applying 1, 2, or 3 patches are (50\%,30\%,20\%).
456 \end{minipage} 461 \end{minipage}
457 462
458 \vspace*{2mm} 463 \vspace*{1mm}
459 464
460 \begin{minipage}[t]{0.25\linewidth} 465 \begin{minipage}[t]{0.25\linewidth}
461 \centering 466 \centering
462 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\ 467 \hspace*{-16mm}\includegraphics[scale=.4]{images/Contrast_only.png}\\
463 {\bf Grey Level \& Contrast} 468 {\bf Grey Level \& Contrast}
464 \end{minipage}% 469 \end{minipage}%
465 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth} 470 \hspace{-12mm}\begin{minipage}[t]{0.82\linewidth}
466 t -m "\vspace*{-18mm} 471 \vspace*{-18mm}
467 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white 472 The {\bf grey level and contrast} module changes the contrast by changing grey levels, and may invert the image polarity (white
468 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$ 473 to black and black to white). The contrast is $C \sim U[1-0.85 \times complexity,1]$
469 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 474 so the image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
470 polarity is inverted with probability 50\%. 475 polarity is inverted with probability 50\%.
471 %\vspace{.7cm} 476 %\vspace{.7cm}
484 (bottom right) is used as training example.} 489 (bottom right) is used as training example.}
485 \label{fig:pipeline} 490 \label{fig:pipeline}
486 \end{figure} 491 \end{figure}
487 \fi 492 \fi
488 493
489 494 \vspace*{-3mm}
490 \vspace*{-2mm}
491 \section{Experimental Setup} 495 \section{Experimental Setup}
492 \vspace*{-1mm} 496 \vspace*{-1mm}
493 497
494 Much previous work on deep learning had been performed on 498 Much previous work on deep learning had been performed on
495 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009}, 499 the MNIST digits task~\citep{Hinton06,ranzato-07-small,Bengio-nips-2006,Salakhutdinov+Hinton-2009},