Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 474:bcf024e6ab23
fits now, but still now graphics
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sun, 30 May 2010 11:18:11 -0400 |
parents | 2dd6e8962df1 |
children | db28764b8252 |
comparison
equal
deleted
inserted
replaced
473:92d6df91939f | 474:bcf024e6ab23 |
---|---|
119 There are two main parts in the pipeline. The first one, | 119 There are two main parts in the pipeline. The first one, |
120 from slant to pinch below, performs transformations. The second | 120 from slant to pinch below, performs transformations. The second |
121 part, from blur to contrast, adds different kinds of noise. | 121 part, from blur to contrast, adds different kinds of noise. |
122 | 122 |
123 {\large\bf Transformations}\\ | 123 {\large\bf Transformations}\\ |
124 {\bf Slant}\\ | 124 {\bf Slant.} |
125 We mimic slant by shifting each row of the image | 125 We mimic slant by shifting each row of the image |
126 proportionnaly to its height: $shift = round(slant \times height)$. | 126 proportionnaly to its height: $shift = round(slant \times height)$. |
127 The $slant$ coefficient can be negative or positive with equal probability | 127 The $slant$ coefficient can be negative or positive with equal probability |
128 and its value is randomly sampled according to the complexity level: | 128 and its value is randomly sampled according to the complexity level: |
129 e $slant \sim U[0,complexity]$, so the | 129 e $slant \sim U[0,complexity]$, so the |
130 maximum displacement for the lowest or highest pixel line is of | 130 maximum displacement for the lowest or highest pixel line is of |
131 $round(complexity \times 32)$.\\ | 131 $round(complexity \times 32)$.\\ |
132 {\bf Thickness}\\ | 132 {\bf Thickness.} |
133 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82} | 133 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82} |
134 are applied. The neighborhood of each pixel is multiplied | 134 are applied. The neighborhood of each pixel is multiplied |
135 element-wise with a {\em structuring element} matrix. | 135 element-wise with a {\em structuring element} matrix. |
136 The pixel value is replaced by the maximum or the minimum of the resulting | 136 The pixel value is replaced by the maximum or the minimum of the resulting |
137 matrix, respectively for dilation or erosion. Ten different structural elements with | 137 matrix, respectively for dilation or erosion. Ten different structural elements with |
141 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ | 141 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ |
142 for erosion. A neutral element is always present in the set, and if it is | 142 for erosion. A neutral element is always present in the set, and if it is |
143 chosen no transformation is applied. Erosion allows only the six | 143 chosen no transformation is applied. Erosion allows only the six |
144 smallest structural elements because when the character is too thin it may | 144 smallest structural elements because when the character is too thin it may |
145 be completely erased.\\ | 145 be completely erased.\\ |
146 {\bf Affine Transformations}\\ | 146 {\bf Affine Transformations.} |
147 A $2 \times 3$ affine transform matrix (with | 147 A $2 \times 3$ affine transform matrix (with |
148 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. | 148 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. |
149 Each pixel $(x,y)$ of the output image takes the value of the pixel | 149 Each pixel $(x,y)$ of the output image takes the value of the pixel |
150 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This | 150 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This |
151 produces scaling, translation, rotation and shearing. | 151 produces scaling, translation, rotation and shearing. |
153 forbid important rotations (not to confuse classes) but to give good | 153 forbid important rotations (not to confuse classes) but to give good |
154 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times | 154 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times |
155 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 | 155 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 |
156 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times | 156 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times |
157 complexity]$.\\ | 157 complexity]$.\\ |
158 {\bf Local Elastic Deformations}\\ | 158 {\bf Local Elastic Deformations.} |
159 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03}, | 159 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03}, |
160 which provides more details. | 160 which provides more details. |
161 Two "displacements" fields are generated and applied, for horizontal | 161 Two "displacements" fields are generated and applied, for horizontal |
162 and vertical displacements of pixels. | 162 and vertical displacements of pixels. |
163 To generate a pixel in either field, first a value between -1 and 1 is | 163 To generate a pixel in either field, first a value between -1 and 1 is |
166 displacements (larger $\alpha$ translates into larger wiggles). | 166 displacements (larger $\alpha$ translates into larger wiggles). |
167 Each field is convoluted with a Gaussian 2D kernel of | 167 Each field is convoluted with a Gaussian 2D kernel of |
168 standard deviation $\sigma$. Visually, this results in a blur. | 168 standard deviation $\sigma$. Visually, this results in a blur. |
169 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times | 169 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times |
170 \sqrt[3]{complexity}$.\\ | 170 \sqrt[3]{complexity}$.\\ |
171 {\bf Pinch}\\ | 171 {\bf Pinch.} |
172 This GIMP filter is named "Whirl and | 172 This GIMP filter is named "Whirl and |
173 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic | 173 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic |
174 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. | 174 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. |
175 For a square input image, think of drawing a circle of | 175 For a square input image, think of drawing a circle of |
176 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 176 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
183 The actual value is given by bilinear interpolation considering the pixels | 183 The actual value is given by bilinear interpolation considering the pixels |
184 around the (non-integer) source position thus found. | 184 around the (non-integer) source position thus found. |
185 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\ | 185 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\ |
186 | 186 |
187 {\large\bf Injecting Noise}\\ | 187 {\large\bf Injecting Noise}\\ |
188 {\bf Motion Blur}\\ | 188 {\bf Motion Blur.} |
189 This GIMP filter is a ``linear motion blur'' in GIMP | 189 This GIMP filter is a ``linear motion blur'' in GIMP |
190 terminology, with two parameters, $length$ and $angle$. The value of | 190 terminology, with two parameters, $length$ and $angle$. The value of |
191 a pixel in the final image is the approximately mean value of the $length$ first pixels | 191 a pixel in the final image is the approximately mean value of the $length$ first pixels |
192 found by moving in the $angle$ direction. | 192 found by moving in the $angle$ direction. |
193 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\ | 193 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\ |
194 {\bf Occlusion}\\ | 194 {\bf Occlusion.} |
195 This filter selects a random rectangle from an {\em occluder} character | 195 This filter selects a random rectangle from an {\em occluder} character |
196 images and places it over the original {\em occluded} character | 196 images and places it over the original {\em occluded} character |
197 image. Pixels are combined by taking the max(occluder,occluded), | 197 image. Pixels are combined by taking the max(occluder,occluded), |
198 closer to black. The corners of the occluder The rectangle corners | 198 closer to black. The corners of the occluder The rectangle corners |
199 are sampled so that larger complexity gives larger rectangles. | 199 are sampled so that larger complexity gives larger rectangles. |
200 The destination position in the occluded image are also sampled | 200 The destination position in the occluded image are also sampled |
201 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). | 201 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). |
202 It has has a probability of not being applied at all of 60\%.\\ | 202 It has has a probability of not being applied at all of 60\%.\\ |
203 {\bf Pixel Permutation}\\ | 203 {\bf Pixel Permutation.} |
204 This filter permutes neighbouring pixels. It selects first | 204 This filter permutes neighbouring pixels. It selects first |
205 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then | 205 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then |
206 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number | 206 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number |
207 of exchanges to the left, right, top, bottom are equal or does not differ | 207 of exchanges to the left, right, top, bottom are equal or does not differ |
208 from more than 1 if the number of selected pixels is not a multiple of 4. | 208 from more than 1 if the number of selected pixels is not a multiple of 4. |
209 It has has a probability of not being applied at all of 80\%.\\ | 209 It has has a probability of not being applied at all of 80\%.\\ |
210 {\bf Gaussian Noise}\\ | 210 {\bf Gaussian Noise.} |
211 This filter simply adds, to each pixel of the image independently, a | 211 This filter simply adds, to each pixel of the image independently, a |
212 noise $\sim Normal(0(\frac{complexity}{10})^2)$. | 212 noise $\sim Normal(0(\frac{complexity}{10})^2)$. |
213 It has has a probability of not being applied at all of 70\%.\\ | 213 It has has a probability of not being applied at all of 70\%.\\ |
214 {\bf Background Images}\\ | 214 {\bf Background Images.} |
215 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random | 215 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random |
216 background behind the letter. The background is chosen by first selecting, | 216 background behind the letter. The background is chosen by first selecting, |
217 at random, an image from a set of images. Then a 32$\times$32 subregion | 217 at random, an image from a set of images. Then a 32$\times$32 subregion |
218 of that image is chosen as the background image (by sampling position | 218 of that image is chosen as the background image (by sampling position |
219 uniformly while making sure not to cross image borders). | 219 uniformly while making sure not to cross image borders). |
222 intensity) for both the original image and the background image, $maximage$ | 222 intensity) for both the original image and the background image, $maximage$ |
223 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. | 223 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. |
224 Each background pixel value is multiplied by $\frac{max(maximage - | 224 Each background pixel value is multiplied by $\frac{max(maximage - |
225 contrast, 0)}{maxbg}$ (higher contrast yield darker | 225 contrast, 0)}{maxbg}$ (higher contrast yield darker |
226 background). The output image pixels are max(background,original).\\ | 226 background). The output image pixels are max(background,original).\\ |
227 {\bf Salt and Pepper Noise}\\ | 227 {\bf Salt and Pepper Noise.} |
228 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. | 228 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
229 The number of selected pixels is $0.2 \times complexity$. | 229 The number of selected pixels is $0.2 \times complexity$. |
230 This filter has a probability of not being applied at all of 75\%.\\ | 230 This filter has a probability of not being applied at all of 75\%.\\ |
231 {\bf Spatially Gaussian Noise}\\ | 231 {\bf Spatially Gaussian Noise.} |
232 Different regions of the image are spatially smoothed. | 232 Different regions of the image are spatially smoothed. |
233 The image is convolved with a symmetric Gaussian kernel of | 233 The image is convolved with a symmetric Gaussian kernel of |
234 size and variance choosen uniformly in the ranges $[12,12 + 20 \times | 234 size and variance choosen uniformly in the ranges $[12,12 + 20 \times |
235 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 235 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
236 between $0$ and $1$. We also create a symmetric averaging window, of the | 236 between $0$ and $1$. We also create a symmetric averaging window, of the |
240 initialize to zero a mask matrix of the image size. For each selected pixel | 240 initialize to zero a mask matrix of the image size. For each selected pixel |
241 we add to the mask the averaging window centered to it. The final image is | 241 we add to the mask the averaging window centered to it. The final image is |
242 computed from the following element-wise operation: $\frac{image + filtered | 242 computed from the following element-wise operation: $\frac{image + filtered |
243 image \times mask}{mask+1}$. | 243 image \times mask}{mask+1}$. |
244 This filter has a probability of not being applied at all of 75\%.\\ | 244 This filter has a probability of not being applied at all of 75\%.\\ |
245 {\bf Scratches}\\ | 245 {\bf Scratches.} |
246 The scratches module places line-like white patches on the image. The | 246 The scratches module places line-like white patches on the image. The |
247 lines are heavily transformed images of the digit "1" (one), chosen | 247 lines are heavily transformed images of the digit "1" (one), chosen |
248 at random among five thousands such 1 images. The 1 image is | 248 at random among five thousands such 1 images. The 1 image is |
249 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times | 249 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
250 complexity)^2$, using bicubic interpolation, | 250 complexity)^2$, using bicubic interpolation, |
254 This filter is only applied only 15\% of the time. When it is applied, 50\% | 254 This filter is only applied only 15\% of the time. When it is applied, 50\% |
255 of the time, only one patch image is generated and applied. In 30\% of | 255 of the time, only one patch image is generated and applied. In 30\% of |
256 cases, two patches are generated, and otherwise three patches are | 256 cases, two patches are generated, and otherwise three patches are |
257 generated. The patch is applied by taking the maximal value on any given | 257 generated. The patch is applied by taking the maximal value on any given |
258 patch or the original image, for each of the 32x32 pixel locations.\\ | 258 patch or the original image, for each of the 32x32 pixel locations.\\ |
259 {\bf Color and Contrast Changes}\\ | 259 {\bf Color and Contrast Changes.} |
260 This filter changes the constrast and may invert the image polarity (white | 260 This filter changes the constrast and may invert the image polarity (white |
261 on black to black on white). The contrast $C$ is defined here as the | 261 on black to black on white). The contrast $C$ is defined here as the |
262 difference between the maximum and the minimum pixel value of the image. | 262 difference between the maximum and the minimum pixel value of the image. |
263 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$). | 263 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$). |
264 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | 264 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
358 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized | 358 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized |
359 exponentials) on the output layer for estimating P(class | image). | 359 exponentials) on the output layer for estimating P(class | image). |
360 The hyper-parameters are the following: number of hidden units, taken in | 360 The hyper-parameters are the following: number of hidden units, taken in |
361 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training | 361 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training |
362 examples are presented in minibatches of size 20. A constant learning | 362 examples are presented in minibatches of size 20. A constant learning |
363 rate is chosen in $\{10^{-6},10^{-5},10^{-4},10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ | 363 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ |
364 through preliminary experiments, and 0.1 was selected. | 364 through preliminary experiments, and 0.1 was selected. |
365 | 365 |
366 | 366 |
367 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} | 367 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} |
368 \label{SdA} | 368 \label{SdA} |