Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 541:8aad1c6ec39a
reduction espace
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Wed, 02 Jun 2010 10:23:33 -0400 |
parents | 84f42fe05594 |
children | 1cdfc17e890f |
comparison
equal
deleted
inserted
replaced
540:269c39f55134 | 541:8aad1c6ec39a |
---|---|
141 There are two main parts in the pipeline. The first one, | 141 There are two main parts in the pipeline. The first one, |
142 from slant to pinch below, performs transformations. The second | 142 from slant to pinch below, performs transformations. The second |
143 part, from blur to contrast, adds different kinds of noise. | 143 part, from blur to contrast, adds different kinds of noise. |
144 | 144 |
145 \begin{figure}[ht] | 145 \begin{figure}[ht] |
146 \vspace*{-2mm} | |
146 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}} | 147 \centerline{\resizebox{.9\textwidth}{!}{\includegraphics{images/transfo.png}}} |
147 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE | 148 % TODO: METTRE LE NOM DE LA TRANSFO A COTE DE CHAQUE IMAGE |
148 \caption{Illustration of each transformation applied alone to the same image | 149 \caption{Illustration of each transformation applied alone to the same image |
149 of an upper-case h (top left). First row (from left to right) : original image, slant, | 150 of an upper-case h (top left). First row (from left to right) : original image, slant, |
150 thickness, affine transformation (translation, rotation, shear), | 151 thickness, affine transformation (translation, rotation, shear), |
151 local elastic deformation; second row (from left to right) : | 152 local elastic deformation; second row (from left to right) : |
152 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : | 153 pinch, motion blur, occlusion, pixel permutation, Gaussian noise; third row (from left to right) : |
153 background image, salt and pepper noise, spatially Gaussian noise, scratches, | 154 background image, salt and pepper noise, spatially Gaussian noise, scratches, |
154 grey level and contrast changes.} | 155 grey level and contrast changes.} |
155 \label{fig:transfo} | 156 \label{fig:transfo} |
157 \vspace*{-2mm} | |
156 \end{figure} | 158 \end{figure} |
157 | 159 |
158 {\large\bf Transformations} | 160 {\large\bf Transformations} |
159 | 161 |
160 \vspace*{2mm} | 162 \vspace*{0.5mm} |
161 | 163 |
162 {\bf Slant.} | 164 {\bf Slant.} |
163 We mimic slant by shifting each row of the image | 165 Each row of the image is shifted |
164 proportionally to its height: $shift = round(slant \times height)$. | 166 proportionally to its height: $shift = round(slant \times height)$. |
165 The $slant$ coefficient can be negative or positive with equal probability | 167 $slant \sim U[-complexity,complexity]$. |
166 and its value is randomly sampled according to the complexity level: | 168 \vspace*{-1mm} |
167 $slant \sim U[0,complexity]$, so the | |
168 maximum displacement for the lowest or highest pixel line is of | |
169 $round(complexity \times 32)$. | |
170 \vspace*{0mm} | |
171 | 169 |
172 {\bf Thickness.} | 170 {\bf Thickness.} |
173 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} | 171 Morphological operators of dilation and erosion~\citep{Haralick87,Serra82} |
174 are applied. The neighborhood of each pixel is multiplied | 172 are applied. The neighborhood of each pixel is multiplied |
175 element-wise with a {\em structuring element} matrix. | 173 element-wise with a {\em structuring element} matrix. |
176 The pixel value is replaced by the maximum or the minimum of the resulting | 174 The pixel value is replaced by the maximum or the minimum of the resulting |
177 matrix, respectively for dilation or erosion. Ten different structural elements with | 175 matrix, respectively for dilation or erosion. Ten different structural elements with |
178 increasing dimensions (largest is $5\times5$) were used. For each image, | 176 increasing dimensions (largest is $5\times5$) were used. For each image, |
179 randomly sample the operator type (dilation or erosion) with equal probability and one structural | 177 randomly sample the operator type (dilation or erosion) with equal probability and one structural |
180 element from a subset of the $n$ smallest structuring elements where $n$ is | 178 element from a subset of the $n=round(m \times complexity)$ smallest structuring elements |
181 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ | 179 where $m=10$ for dilation and $m=6$ for erosion (to avoid completely erasing thin characters). |
182 for erosion. A neutral element is always present in the set, and if it is | 180 A neutral element (no transformation) |
183 chosen no transformation is applied. Erosion allows only the six | 181 is always present in the set. is applied. |
184 smallest structural elements because when the character is too thin it may | 182 \vspace*{-1mm} |
185 be completely erased. | |
186 \vspace*{0mm} | |
187 | 183 |
188 {\bf Affine Transformations.} | 184 {\bf Affine Transformations.} |
189 A $2 \times 3$ affine transform matrix (with | 185 A $2 \times 3$ affine transform matrix (with |
190 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | 186 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. |
191 Each pixel $(x,y)$ of the output image takes the value of the pixel | 187 Output pixel $(x,y)$ takes the value of input pixel |
192 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This | 188 nearest to $(ax+by+c,dx+ey+f)$, |
193 produces scaling, translation, rotation and shearing. | 189 producing scaling, translation, rotation and shearing. |
194 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to | 190 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to |
195 forbid important rotations (not to confuse classes) but to give good | 191 forbid important rotations (not to confuse classes) but to give good |
196 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times | 192 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times |
197 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 | 193 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 |
198 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times | 194 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times |
199 complexity]$. | 195 complexity]$. |
200 \vspace*{0mm} | 196 \vspace*{-1mm} |
201 | 197 |
202 {\bf Local Elastic Deformations.} | 198 {\bf Local Elastic Deformations.} |
203 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, | 199 This filter induces a ``wiggly'' effect in the image, following~\citet{SimardSP03-short}, |
204 which provides more details. | 200 which provides more details. |
205 Two ``displacements'' fields are generated and applied, for horizontal | 201 The intensity of the displacement fields is given by |
206 and vertical displacements of pixels. | 202 $\alpha = \sqrt[3]{complexity} \times 10.0$, which are |
207 To generate a pixel in either field, first a value between -1 and 1 is | 203 convolved with a Gaussian 2D kernel (resulting in a blur) of |
208 chosen from a uniform distribution. Then all the pixels, in both fields, are | 204 standard deviation $\sigma = 10 - 7 \times\sqrt[3]{complexity}$. |
209 multiplied by a constant $\alpha$ which controls the intensity of the | 205 \vspace*{-1mm} |
210 displacements (larger $\alpha$ translates into larger wiggles). | |
211 Each field is convoluted with a Gaussian 2D kernel of | |
212 standard deviation $\sigma$. Visually, this results in a blur. | |
213 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times | |
214 \sqrt[3]{complexity}$. | |
215 \vspace*{0mm} | |
216 | 206 |
217 {\bf Pinch.} | 207 {\bf Pinch.} |
218 This is a GIMP filter called ``Whirl and | 208 This is the ``Whirl and pinch'' GIMP filter but with whirl was set to 0. |
219 pinch'', but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic | 209 A pinch is ``similar to projecting the image onto an elastic |
220 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). | 210 surface and pressing or pulling on the center of the surface'' (GIMP documentation manual). |
221 For a square input image, this is akin to drawing a circle of | 211 For a square input image, this is akin to drawing a circle of |
222 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 212 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
223 that disk (region inside circle) will have its value recalculated by taking | 213 that disk (region inside circle) will have its value recalculated by taking |
224 the value of another ``source'' pixel in the original image. The position of | 214 the value of another ``source'' pixel in the original image. The position of |
228 d_1$, where $pinch$ is a parameter to the filter. | 218 d_1$, where $pinch$ is a parameter to the filter. |
229 The actual value is given by bilinear interpolation considering the pixels | 219 The actual value is given by bilinear interpolation considering the pixels |
230 around the (non-integer) source position thus found. | 220 around the (non-integer) source position thus found. |
231 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. | 221 Here $pinch \sim U[-complexity, 0.7 \times complexity]$. |
232 | 222 |
233 \vspace*{1mm} | 223 \vspace*{0.5mm} |
234 | 224 |
235 {\large\bf Injecting Noise} | 225 {\large\bf Injecting Noise} |
236 | 226 |
237 \vspace*{1mm} | 227 \vspace*{0.5mm} |
238 | 228 |
239 {\bf Motion Blur.} | 229 {\bf Motion Blur.} |
240 This is a ``linear motion blur'' in GIMP | 230 This is a ``linear motion blur'' in GIMP |
241 terminology, with two parameters, $length$ and $angle$. The value of | 231 terminology, with two parameters, $length$ and $angle$. The value of |
242 a pixel in the final image is approximately the mean value of the $length$ first pixels | 232 a pixel in the final image is approximately the mean value of the $length$ first pixels |
243 found by moving in the $angle$ direction. | 233 found by moving in the $angle$ direction. |
244 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. | 234 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$. |
245 \vspace*{0mm} | 235 \vspace*{-1mm} |
246 | 236 |
247 {\bf Occlusion.} | 237 {\bf Occlusion.} |
248 Selects a random rectangle from an {\em occluder} character | 238 Selects a random rectangle from an {\em occluder} character |
249 images and places it over the original {\em occluded} character | 239 images and places it over the original {\em occluded} character |
250 image. Pixels are combined by taking the max(occluder,occluded), | 240 image. Pixels are combined by taking the max(occluder,occluded), |
251 closer to black. The rectangle corners | 241 closer to black. The rectangle corners |
252 are sampled so that larger complexity gives larger rectangles. | 242 are sampled so that larger complexity gives larger rectangles. |
253 The destination position in the occluded image are also sampled | 243 The destination position in the occluded image are also sampled |
254 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). | 244 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). |
255 This filter has a probability of 60\% of not being applied. | 245 This filter has a probability of 60\% of not being applied. |
256 \vspace*{0mm} | 246 \vspace*{-1mm} |
257 | 247 |
258 {\bf Pixel Permutation.} | 248 {\bf Pixel Permutation.} |
259 This filter permutes neighbouring pixels. It selects first | 249 This filter permutes neighbouring pixels. It selects first |
260 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then | 250 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then |
261 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number | 251 sequentially exchanged with one other pixel in its $V4$ neighbourhood. The number |
262 of exchanges to the left, right, top, bottom is equal or does not differ | 252 of exchanges to the left, right, top, bottom is equal or does not differ |
263 from more than 1 if the number of selected pixels is not a multiple of 4. | 253 from more than 1 if the number of selected pixels is not a multiple of 4. |
264 % TODO: The previous sentence is hard to parse | 254 % TODO: The previous sentence is hard to parse |
265 This filter has a probability of 80\% of not being applied. | 255 This filter has a probability of 80\% of not being applied. |
266 \vspace*{0mm} | 256 \vspace*{-1mm} |
267 | 257 |
268 {\bf Gaussian Noise.} | 258 {\bf Gaussian Noise.} |
269 This filter simply adds, to each pixel of the image independently, a | 259 This filter simply adds, to each pixel of the image independently, a |
270 noise $\sim Normal(0(\frac{complexity}{10})^2)$. | 260 noise $\sim Normal(0(\frac{complexity}{10})^2)$. |
271 It has a probability of 70\% of not being applied. | 261 It has a probability of 70\% of not being applied. |
272 \vspace*{0mm} | 262 \vspace*{-1mm} |
273 | 263 |
274 {\bf Background Images.} | 264 {\bf Background Images.} |
275 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | 265 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
276 background behind the letter. The background is chosen by first selecting, | 266 background behind the letter. The background is chosen by first selecting, |
277 at random, an image from a set of images. Then a 32$\times$32 sub-region | 267 at random, an image from a set of images. Then a 32$\times$32 sub-region |
282 intensity) for both the original image and the background image, $maximage$ | 272 intensity) for both the original image and the background image, $maximage$ |
283 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. | 273 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. |
284 Each background pixel value is multiplied by $\frac{max(maximage - | 274 Each background pixel value is multiplied by $\frac{max(maximage - |
285 contrast, 0)}{maxbg}$ (higher contrast yield darker | 275 contrast, 0)}{maxbg}$ (higher contrast yield darker |
286 background). The output image pixels are max(background,original). | 276 background). The output image pixels are max(background,original). |
287 \vspace*{0mm} | 277 \vspace*{-1mm} |
288 | 278 |
289 {\bf Salt and Pepper Noise.} | 279 {\bf Salt and Pepper Noise.} |
290 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | 280 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
291 The number of selected pixels is $0.2 \times complexity$. | 281 The number of selected pixels is $0.2 \times complexity$. |
292 This filter has a probability of not being applied at all of 75\%. | 282 This filter has a probability of not being applied at all of 75\%. |
293 \vspace*{0mm} | 283 \vspace*{-1mm} |
294 | 284 |
295 {\bf Spatially Gaussian Noise.} | 285 {\bf Spatially Gaussian Noise.} |
296 Different regions of the image are spatially smoothed. | 286 Different regions of the image are spatially smoothed. |
297 The image is convolved with a symmetric Gaussian kernel of | 287 The image is convolved with a symmetric Gaussian kernel of |
298 size and variance chosen uniformly in the ranges $[12,12 + 20 \times | 288 size and variance chosen uniformly in the ranges $[12,12 + 20 \times |
304 initialize to zero a mask matrix of the image size. For each selected pixel | 294 initialize to zero a mask matrix of the image size. For each selected pixel |
305 we add to the mask the averaging window centered to it. The final image is | 295 we add to the mask the averaging window centered to it. The final image is |
306 computed from the following element-wise operation: $\frac{image + filtered | 296 computed from the following element-wise operation: $\frac{image + filtered |
307 image \times mask}{mask+1}$. | 297 image \times mask}{mask+1}$. |
308 This filter has a probability of not being applied at all of 75\%. | 298 This filter has a probability of not being applied at all of 75\%. |
309 \vspace*{0mm} | 299 \vspace*{-1mm} |
310 | 300 |
311 {\bf Scratches.} | 301 {\bf Scratches.} |
312 The scratches module places line-like white patches on the image. The | 302 The scratches module places line-like white patches on the image. The |
313 lines are heavily transformed images of the digit ``1'' (one), chosen | 303 lines are heavily transformed images of the digit ``1'' (one), chosen |
314 at random among five thousands such 1 images. The 1 image is | 304 at random among five thousands such 1 images. The 1 image is |
320 This filter is only applied only 15\% of the time. When it is applied, 50\% | 310 This filter is only applied only 15\% of the time. When it is applied, 50\% |
321 of the time, only one patch image is generated and applied. In 30\% of | 311 of the time, only one patch image is generated and applied. In 30\% of |
322 cases, two patches are generated, and otherwise three patches are | 312 cases, two patches are generated, and otherwise three patches are |
323 generated. The patch is applied by taking the maximal value on any given | 313 generated. The patch is applied by taking the maximal value on any given |
324 patch or the original image, for each of the 32x32 pixel locations. | 314 patch or the original image, for each of the 32x32 pixel locations. |
325 \vspace*{0mm} | 315 \vspace*{-1mm} |
326 | 316 |
327 {\bf Grey Level and Contrast Changes.} | 317 {\bf Grey Level and Contrast Changes.} |
328 This filter changes the contrast and may invert the image polarity (white | 318 This filter changes the contrast and may invert the image polarity (white |
329 on black to black on white). The contrast $C$ is defined here as the | 319 on black to black on white). The contrast $C$ is defined here as the |
330 difference between the maximum and the minimum pixel value of the image. | 320 difference between the maximum and the minimum pixel value of the image. |
485 rate was chosen among $\{0.001, 0.01, 0.025, 0.075, 0.1, 0.5\}$ | 475 rate was chosen among $\{0.001, 0.01, 0.025, 0.075, 0.1, 0.5\}$ |
486 through preliminary experiments (measuring performance on a validation set), | 476 through preliminary experiments (measuring performance on a validation set), |
487 and $0.1$ was then selected for optimizing on the whole training sets. | 477 and $0.1$ was then selected for optimizing on the whole training sets. |
488 | 478 |
489 \begin{figure}[ht] | 479 \begin{figure}[ht] |
480 \vspace*{-2mm} | |
490 \centerline{\resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}} | 481 \centerline{\resizebox{0.8\textwidth}{!}{\includegraphics{images/denoising_autoencoder_small.pdf}}} |
491 \caption{Illustration of the computations and training criterion for the denoising | 482 \caption{Illustration of the computations and training criterion for the denoising |
492 auto-encoder used to pre-train each layer of the deep architecture. Input $x$ of | 483 auto-encoder used to pre-train each layer of the deep architecture. Input $x$ of |
493 the layer (i.e. raw input or output of previous layer) | 484 the layer (i.e. raw input or output of previous layer) |
494 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$. | 485 is corrupted into $\tilde{x}$ and encoded into code $y$ by the encoder $f_\theta(\cdot)$. |
495 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which | 486 The decoder $g_{\theta'}(\cdot)$ maps $y$ to reconstruction $z$, which |
496 is compared to the uncorrupted input $x$ through the loss function | 487 is compared to the uncorrupted input $x$ through the loss function |
497 $L_H(x,z)$, whose expected value is approximately minimized during training | 488 $L_H(x,z)$, whose expected value is approximately minimized during training |
498 by tuning $\theta$ and $\theta'$.} | 489 by tuning $\theta$ and $\theta'$.} |
499 \label{fig:da} | 490 \label{fig:da} |
491 \vspace*{-2mm} | |
500 \end{figure} | 492 \end{figure} |
501 | 493 |
502 {\bf Stacked Denoising Auto-Encoders (SDA).} | 494 {\bf Stacked Denoising Auto-Encoders (SDA).} |
503 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) | 495 Various auto-encoder variants and Restricted Boltzmann Machines (RBMs) |
504 can be used to initialize the weights of each layer of a deep MLP (with many hidden | 496 can be used to initialize the weights of each layer of a deep MLP (with many hidden |
541 stacked denoising auto-encoders on MNIST~\citep{VincentPLarochelleH2008}. | 533 stacked denoising auto-encoders on MNIST~\citep{VincentPLarochelleH2008}. |
542 | 534 |
543 \vspace*{-1mm} | 535 \vspace*{-1mm} |
544 | 536 |
545 \begin{figure}[ht] | 537 \begin{figure}[ht] |
538 \vspace*{-2mm} | |
546 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}} | 539 \centerline{\resizebox{.99\textwidth}{!}{\includegraphics{images/error_rates_charts.pdf}}} |
547 \caption{Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained | 540 \caption{Error bars indicate a 95\% confidence interval. 0 indicates that the model was trained |
548 on NIST, 1 on NISTP, and 2 on P07. Left: overall results | 541 on NIST, 1 on NISTP, and 2 on P07. Left: overall results |
549 of all models, on 3 different test sets (NIST, NISTP, P07). | 542 of all models, on 3 different test sets (NIST, NISTP, P07). |
550 Right: error rates on NIST test digits only, along with the previous results from | 543 Right: error rates on NIST test digits only, along with the previous results from |
551 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} | 544 literature~\citep{Granger+al-2007,Cortes+al-2000,Oliveira+al-2002-short,Milgram+al-2005} |
552 respectively based on ART, nearest neighbors, MLPs, and SVMs.} | 545 respectively based on ART, nearest neighbors, MLPs, and SVMs.} |
553 | 546 |
554 \label{fig:error-rates-charts} | 547 \label{fig:error-rates-charts} |
555 \vspace*{-1mm} | 548 \vspace*{-2mm} |
556 \end{figure} | 549 \end{figure} |
557 | 550 |
558 | 551 |
559 \section{Experimental Results} | 552 \section{Experimental Results} |
560 | 553 |