comparison writeup/nips2010_submission.tex @ 541:8aad1c6ec39a

reduction espace
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Wed, 02 Jun 2010 10:23:33 -0400
parents 84f42fe05594
children 1cdfc17e890f
comparison
equal deleted inserted replaced
540:269c39f55134 541:8aad1c6ec39a
141 There are two main parts in the pipeline. The first one, 141 There are two main parts in the pipeline. The first one,
142 from slant to pinch below, performs transformations. The second 142 from slant to pinch below, performs transformations. The second
143 part, from blur to contrast, adds different kinds of noise. 143 part, from blur to contrast, adds different kinds of noise.
144 144
145 \begin{figure}[ht] 145 \begin{figure}[ht]
146 \vspace*{-2mm}
146 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}} 147 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}}
147 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE 148 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE
148 \caption{Illustration of each transformation applied alone to the same image 149 \caption{Illustration of each transformation applied alone to the same image
149 of an upper-case h (top left). First row (from left to right) : original image, slant, 150 of an upper-case h (top left). First row (from left to right) : original image, slant,
150 thickness, affine transformation (translation, rotation, shear), 151 thickness, affine transformation (translation, rotation, shear),
151 local elastic deformation; second row (from left to right) : 152 local elastic deformation; second row (from left to right) :
152 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : 153 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) :
153 background image, salt and pepper noise, spatially Gaussian noise, scratches, 154 background image, salt and pepper noise, spatially Gaussian noise, scratches,
154 grey level and contrast changes.} 155 grey level and contrast changes.}
155 \label{fig:transfo} 156 \label{fig:transfo}
157 \vspace*{-2mm}
156 \end{figure} 158 \end{figure}
157 159
158 {\large\bf Transformations} 160 {\large\bf Transformations}
159 161
160 \vspace*{2mm} 162 \vspace*{0.5mm}
161 163
162 {\bf Slant.} 164 {\bf Slant.}
163 We mimic slant by shifting each row of the image 165 Each row of the image is shifted
164 proportionally to its height: $shift = round(slant \times height)$. 166 proportionally to its height: $shift = round(slant \times height)$.
165 The $slant$ coefficient can be negative or positive with equal probability 167 $slant \sim U[-complexity,complexity]$.
166 and its value is randomly sampled according to the complexity level: 168 \vspace*{-1mm}
167 $slant \sim U[0,complexity]$, so the
168 maximum displacement for the lowest or highest pixel line is of
169 $round(complexity \times 32)$.
170 \vspace*{0mm}
171 169
172 {\bf Thickness.} 170 {\bf Thickness.}
173 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} 171 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82}
174 are applied. The neighborhood of each pixel is multiplied 172 are applied. The neighborhood of each pixel is multiplied
175 element-wise with a {\em structuring element} matrix. 173 element-wise with a {\em structuring element} matrix.
176 The pixel value is replaced by the maximum or the minimum of the resulting 174 The pixel value is replaced by the maximum or the minimum of the resulting
177 matrix, respectively for dilation or erosion. Ten different structural elements with 175 matrix, respectively for dilation or erosion. Ten different structural elements with
178 increasing dimensions (largest is $5\times5$) were used. For each image, 176 increasing dimensions (largest is $5\times5$) were used. For each image,
179 randomly sample the operator type (dilation or erosion) with equal probability and one structural 177 randomly sample the operator type (dilation or erosion) with equal probability and one structural
180 element from a subset of the $n$ smallest structuring elements where $n$ is 178 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements
181 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ 179 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters).
182 for erosion. A neutral element is always present in the set, and if it is 180 A neutral element (no transformation)
183 chosen no transformation is applied. Erosion allows only the six 181 is always present in the set. is applied.
184 smallest structural elements because when the character is too thin it may 182 \vspace*{-1mm}
185 be completely erased.
186 \vspace*{0mm}
187 183
188 {\bf Affine Transformations.} 184 {\bf Affine Transformations.}
189 A $2 \times 3$ affine transform matrix (with 185 A $2 \times 3$ affine transform matrix (with
190 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. 186 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
191 Each pixel $(x,y)$ of the output image takes the value of the pixel 187 Output pixel $(x,y)$ takes the value of input pixel
192 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This 188 nearest to $(ax+by+c,dx+ey+f)$,
193 produces scaling, translation, rotation and shearing. 189 producing scaling, translation, rotation and shearing.
194 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to 190 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to
195 forbid important rotations (not to confuse classes) but to give good 191 forbid important rotations (not to confuse classes) but to give good
196 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times 192 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
197 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 193 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
198 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times 194 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
199 complexity]$. 195 complexity]$.
200 \vspace*{0mm} 196 \vspace*{-1mm}
201 197
202 {\bf Local Elastic Deformations.} 198 {\bf Local Elastic Deformations.}
203 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, 199 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short},
204 which provides more details. 200 which provides more details.
205 Two ``displacements'' fields are generated and applied, for horizontal 201 The intensity of the displacement fields is given by
206 and vertical displacements of pixels. 202 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are
207 To generate a pixel in either field, first a value between -1 and 1 is 203 convolved with a Gaussian 2D kernel (resulting in a blur) of
208 chosen from a uniform distribution. Then all the pixels, in both fields, are 204 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$.
209 multiplied by a constant $\alpha$ which controls the intensity of the 205 \vspace*{-1mm}
210 displacements (larger $\alpha$ translates into larger wiggles).
211 Each field is convoluted with a Gaussian 2D kernel of
212 standard deviation $\sigma$. Visually, this results in a blur.
213 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
214 \sqrt[3]{complexity}$.
215 \vspace*{0mm}
216 206
217 {\bf Pinch.} 207 {\bf Pinch.}
218 This is a GIMP filter called ``Whirl and 208 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0.
219 pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic 209 A pinch is ``similar to projecting the image onto an elastic
220 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). 210 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual).
221 For a square input image, this is akin to drawing a circle of 211 For a square input image, this is akin to drawing a circle of
222 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 212 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
223 that disk (region inside circle) will have its value recalculated by taking 213 that disk (region inside circle) will have its value recalculated by taking
224 the value of another ``source'' pixel in the original image. The position of 214 the value of another ``source'' pixel in the original image. The position of
228 d_1$, where $pinch$ is a parameter to the filter. 218 d_1$, where $pinch$ is a parameter to the filter.
229 The actual value is given by bilinear interpolation considering the pixels 219 The actual value is given by bilinear interpolation considering the pixels
230 around the (non-integer) source position thus found. 220 around the (non-integer) source position thus found.
231 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. 221 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.
232 222
233 \vspace*{1mm} 223 \vspace*{0.5mm}
234 224
235 {\large\bf Injecting Noise} 225 {\large\bf Injecting Noise}
236 226
237 \vspace*{1mm} 227 \vspace*{0.5mm}
238 228
239 {\bf Motion Blur.} 229 {\bf Motion Blur.}
240 This is a ``linear motion blur'' in GIMP 230 This is a ``linear motion blur'' in GIMP
241 terminology, with two parameters, $length$ and $angle$. The value of 231 terminology, with two parameters, $length$ and $angle$. The value of
242 a pixel in the final image is approximately the mean value of the $length$ first pixels 232 a pixel in the final image is approximately the mean value of the $length$ first pixels
243 found by moving in the $angle$ direction. 233 found by moving in the $angle$ direction.
244 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. 234 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.
245 \vspace*{0mm} 235 \vspace*{-1mm}
246 236
247 {\bf Occlusion.} 237 {\bf Occlusion.}
248 Selects a random rectangle from an {\em occluder} character 238 Selects a random rectangle from an {\em occluder} character
249 images and places it over the original {\em occluded} character 239 images and places it over the original {\em occluded} character
250 image. Pixels are combined by taking the max(occluder,occluded), 240 image. Pixels are combined by taking the max(occluder,occluded),
251 closer to black. The rectangle corners 241 closer to black. The rectangle corners
252 are sampled so that larger complexity gives larger rectangles. 242 are sampled so that larger complexity gives larger rectangles.
253 The destination position in the occluded image are also sampled 243 The destination position in the occluded image are also sampled
254 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). 244 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}).
255 This filter has a probability of 60\% of not being applied. 245 This filter has a probability of 60\% of not being applied.
256 \vspace*{0mm} 246 \vspace*{-1mm}
257 247
258 {\bf Pixel Permutation.} 248 {\bf Pixel Permutation.}
259 This filter permutes neighbouring pixels. It selects first 249 This filter permutes neighbouring pixels. It selects first
260 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then 250 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then
261 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number 251 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number
262 of exchanges to the left, right, top, bottom is equal or does not differ 252 of exchanges to the left, right, top, bottom is equal or does not differ
263 from more than 1 if the number of selected pixels is not a multiple of 4. 253 from more than 1 if the number of selected pixels is not a multiple of 4.
264 % TODO: The previous sentence is hard to parse 254 % TODO: The previous sentence is hard to parse
265 This filter has a probability of 80\% of not being applied. 255 This filter has a probability of 80\% of not being applied.
266 \vspace*{0mm} 256 \vspace*{-1mm}
267 257
268 {\bf Gaussian Noise.} 258 {\bf Gaussian Noise.}
269 This filter simply adds, to each pixel of the image independently, a 259 This filter simply adds, to each pixel of the image independently, a
270 noise $\sim Normal(0(\frac{complexity}{10})^2)$. 260 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
271 It has a probability of 70\% of not being applied. 261 It has a probability of 70\% of not being applied.
272 \vspace*{0mm} 262 \vspace*{-1mm}
273 263
274 {\bf Background Images.} 264 {\bf Background Images.}
275 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random 265 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
276 background behind the letter. The background is chosen by first selecting, 266 background behind the letter. The background is chosen by first selecting,
277 at random, an image from a set of images. Then a 32$\times$32 sub-region 267 at random, an image from a set of images. Then a 32$\times$32 sub-region
282 intensity) for both the original image and the background image, $maximage$ 272 intensity) for both the original image and the background image, $maximage$
283 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. 273 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$.
284 Each background pixel value is multiplied by $\frac{max(maximage - 274 Each background pixel value is multiplied by $\frac{max(maximage -
285 contrast, 0)}{maxbg}$ (higher contrast yield darker 275 contrast, 0)}{maxbg}$ (higher contrast yield darker
286 background). The output image pixels are max(background,original). 276 background). The output image pixels are max(background,original).
287 \vspace*{0mm} 277 \vspace*{-1mm}
288 278
289 {\bf Salt and Pepper Noise.} 279 {\bf Salt and Pepper Noise.}
290 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. 280 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
291 The number of selected pixels is $0.2 \times complexity$. 281 The number of selected pixels is $0.2 \times complexity$.
292 This filter has a probability of not being applied at all of 75\%. 282 This filter has a probability of not being applied at all of 75\%.
293 \vspace*{0mm} 283 \vspace*{-1mm}
294 284
295 {\bf Spatially Gaussian Noise.} 285 {\bf Spatially Gaussian Noise.}
296 Different regions of the image are spatially smoothed. 286 Different regions of the image are spatially smoothed.
297 The image is convolved with a symmetric Gaussian kernel of 287 The image is convolved with a symmetric Gaussian kernel of
298 size and variance chosen uniformly in the ranges $[12,12 + 20 \times 288 size and variance chosen uniformly in the ranges $[12,12 + 20 \times
304 initialize to zero a mask matrix of the image size. For each selected pixel 294 initialize to zero a mask matrix of the image size. For each selected pixel
305 we add to the mask the averaging window centered to it. The final image is 295 we add to the mask the averaging window centered to it. The final image is
306 computed from the following element-wise operation: $\frac{image + filtered 296 computed from the following element-wise operation: $\frac{image + filtered
307 image \times mask}{mask+1}$. 297 image \times mask}{mask+1}$.
308 This filter has a probability of not being applied at all of 75\%. 298 This filter has a probability of not being applied at all of 75\%.
309 \vspace*{0mm} 299 \vspace*{-1mm}
310 300
311 {\bf Scratches.} 301 {\bf Scratches.}
312 The scratches module places line-like white patches on the image. The 302 The scratches module places line-like white patches on the image. The
313 lines are heavily transformed images of the digit ``1'' (one), chosen 303 lines are heavily transformed images of the digit ``1'' (one), chosen
314 at random among five thousands such 1 images. The 1 image is 304 at random among five thousands such 1 images. The 1 image is
320 This filter is only applied only 15\% of the time. When it is applied, 50\% 310 This filter is only applied only 15\% of the time. When it is applied, 50\%
321 of the time, only one patch image is generated and applied. In 30\% of 311 of the time, only one patch image is generated and applied. In 30\% of
322 cases, two patches are generated, and otherwise three patches are 312 cases, two patches are generated, and otherwise three patches are
323 generated. The patch is applied by taking the maximal value on any given 313 generated. The patch is applied by taking the maximal value on any given
324 patch or the original image, for each of the 32x32 pixel locations. 314 patch or the original image, for each of the 32x32 pixel locations.
325 \vspace*{0mm} 315 \vspace*{-1mm}
326 316
327 {\bf Grey Level and Contrast Changes.} 317 {\bf Grey Level and Contrast Changes.}
328 This filter changes the contrast and may invert the image polarity (white 318 This filter changes the contrast and may invert the image polarity (white
329 on black to black on white). The contrast $C$ is defined here as the 319 on black to black on white). The contrast $C$ is defined here as the
330 difference between the maximum and the minimum pixel value of the image. 320 difference between the maximum and the minimum pixel value of the image.
485 rate was chosen among $\{0.001, 0.01, 0.025, 0.075, 0.1, 0.5\}$ 475 rate was chosen among $\{0.001, 0.01, 0.025, 0.075, 0.1, 0.5\}$
486 through preliminary experiments (measuring performance on a validation set), 476 through preliminary experiments (measuring performance on a validation set),
487 and $0.1$ was then selected for optimizing on the whole training sets. 477 and $0.1$ was then selected for optimizing on the whole training sets.
488 478
489 \begin{figure}[ht] 479 \begin{figure}[ht]
480 \vspace*{-2mm}
490 \centerline{\resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}} 481 \centerline{\resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}}
491 \caption{Illustration of the computations and training criterion for the denoising 482 \caption{Illustration of the computations and training criterion for the denoising
492 auto-encoder used to pre-train each layer of the deep architecture. Input $x$ of 483 auto-encoder used to pre-train each layer of the deep architecture. Input $x$ of
493 the layer (i.e. raw input or output of previous layer) 484 the layer (i.e. raw input or output of previous layer)
494 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$. 485 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$.
495 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which 486 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which
496 is compared to the uncorrupted input $x$ through the loss function 487 is compared to the uncorrupted input $x$ through the loss function
497 $L_H(x,z)$, whose expected value is approximately minimized during training 488 $L_H(x,z)$, whose expected value is approximately minimized during training
498 by tuning $\theta$ and $\theta'$.} 489 by tuning $\theta$ and $\theta'$.}
499 \label{fig:da} 490 \label{fig:da}
491 \vspace*{-2mm}
500 \end{figure} 492 \end{figure}
501 493
502 {\bf Stacked Denoising Auto-Encoders (SDA).} 494 {\bf Stacked Denoising Auto-Encoders (SDA).}
503 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) 495 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs)
504 can be used to initialize the weights of each layer of a deep MLP (with many hidden 496 can be used to initialize the weights of each layer of a deep MLP (with many hidden
541 stacked denoising auto-encoders on MNIST~\citep{VincentPLarochelleH2008}. 533 stacked denoising auto-encoders on MNIST~\citep{VincentPLarochelleH2008}.
542 534
543 \vspace*{-1mm} 535 \vspace*{-1mm}
544 536
545 \begin{figure}[ht] 537 \begin{figure}[ht]
538 \vspace*{-2mm}
546 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}} 539 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}}
547 \caption{Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained 540 \caption{Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained
548 on NIST, 1 on NISTP, and 2 on P07. Left: overall results 541 on NIST, 1 on NISTP, and 2 on P07. Left: overall results
549 of all models, on 3 different test sets (NIST, NISTP, P07). 542 of all models, on 3 different test sets (NIST, NISTP, P07).
550 Right: error rates on NIST test digits only, along with the previous results from 543 Right: error rates on NIST test digits only, along with the previous results from
551 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} 544 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005}
552 respectively based on ART, nearest neighbors, MLPs, and SVMs.} 545 respectively based on ART, nearest neighbors, MLPs, and SVMs.}
553 546
554 \label{fig:error-rates-charts} 547 \label{fig:error-rates-charts}
555 \vspace*{-1mm} 548 \vspace*{-2mm}
556 \end{figure} 549 \end{figure}
557 550
558 551
559 \section{Experimental Results} 552 \section{Experimental Results}
560 553