comparison writeup/nips2010_submission.tex @ 474:bcf024e6ab23

fits now, but still now graphics
author Yoshua Bengio <bengioy@iro.umontreal.ca>
date Sun, 30 May 2010 11:18:11 -0400
parents 2dd6e8962df1
children db28764b8252
comparison
equal deleted inserted replaced
473:92d6df91939f 474:bcf024e6ab23
119 There are two main parts in the pipeline. The first one, 119 There are two main parts in the pipeline. The first one,
120 from slant to pinch below, performs transformations. The second 120 from slant to pinch below, performs transformations. The second
121 part, from blur to contrast, adds different kinds of noise. 121 part, from blur to contrast, adds different kinds of noise.
122 122
123 {\large\bf Transformations}\\ 123 {\large\bf Transformations}\\
124 {\bf Slant}\\ 124 {\bf Slant.}
125 We mimic slant by shifting each row of the image 125 We mimic slant by shifting each row of the image
126 proportionnaly to its height: $shift = round(slant \times height)$. 126 proportionnaly to its height: $shift = round(slant \times height)$.
127 The $slant$ coefficient can be negative or positive with equal probability 127 The $slant$ coefficient can be negative or positive with equal probability
128 and its value is randomly sampled according to the complexity level: 128 and its value is randomly sampled according to the complexity level:
129 e $slant \sim U[0,complexity]$, so the 129 e $slant \sim U[0,complexity]$, so the
130 maximum displacement for the lowest or highest pixel line is of 130 maximum displacement for the lowest or highest pixel line is of
131 $round(complexity \times 32)$.\\ 131 $round(complexity \times 32)$.\\
132 {\bf Thickness}\\ 132 {\bf Thickness.}
133 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82} 133 Morpholigical operators of dilation and erosion~\citep{Haralick87,Serra82}
134 are applied. The neighborhood of each pixel is multiplied 134 are applied. The neighborhood of each pixel is multiplied
135 element-wise with a {\em structuring element} matrix. 135 element-wise with a {\em structuring element} matrix.
136 The pixel value is replaced by the maximum or the minimum of the resulting 136 The pixel value is replaced by the maximum or the minimum of the resulting
137 matrix, respectively for dilation or erosion. Ten different structural elements with 137 matrix, respectively for dilation or erosion. Ten different structural elements with
141 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ 141 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$
142 for erosion. A neutral element is always present in the set, and if it is 142 for erosion. A neutral element is always present in the set, and if it is
143 chosen no transformation is applied. Erosion allows only the six 143 chosen no transformation is applied. Erosion allows only the six
144 smallest structural elements because when the character is too thin it may 144 smallest structural elements because when the character is too thin it may
145 be completely erased.\\ 145 be completely erased.\\
146 {\bf Affine Transformations}\\ 146 {\bf Affine Transformations.}
147 A $2 \times 3$ affine transform matrix (with 147 A $2 \times 3$ affine transform matrix (with
148 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. 148 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level.
149 Each pixel $(x,y)$ of the output image takes the value of the pixel 149 Each pixel $(x,y)$ of the output image takes the value of the pixel
150 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This 150 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This
151 produces scaling, translation, rotation and shearing. 151 produces scaling, translation, rotation and shearing.
153 forbid important rotations (not to confuse classes) but to give good 153 forbid important rotations (not to confuse classes) but to give good
154 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times 154 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times
155 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 155 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3
156 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times 156 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times
157 complexity]$.\\ 157 complexity]$.\\
158 {\bf Local Elastic Deformations}\\ 158 {\bf Local Elastic Deformations.}
159 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03}, 159 This filter induces a "wiggly" effect in the image, following~\citet{SimardSP03},
160 which provides more details. 160 which provides more details.
161 Two "displacements" fields are generated and applied, for horizontal 161 Two "displacements" fields are generated and applied, for horizontal
162 and vertical displacements of pixels. 162 and vertical displacements of pixels.
163 To generate a pixel in either field, first a value between -1 and 1 is 163 To generate a pixel in either field, first a value between -1 and 1 is
166 displacements (larger $\alpha$ translates into larger wiggles). 166 displacements (larger $\alpha$ translates into larger wiggles).
167 Each field is convoluted with a Gaussian 2D kernel of 167 Each field is convoluted with a Gaussian 2D kernel of
168 standard deviation $\sigma$. Visually, this results in a blur. 168 standard deviation $\sigma$. Visually, this results in a blur.
169 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times 169 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times
170 \sqrt[3]{complexity}$.\\ 170 \sqrt[3]{complexity}$.\\
171 {\bf Pinch}\\ 171 {\bf Pinch.}
172 This GIMP filter is named "Whirl and 172 This GIMP filter is named "Whirl and
173 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic 173 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic
174 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}. 174 surface and pressing or pulling on the center of the surface''~\citep{GIMP-manual}.
175 For a square input image, think of drawing a circle of 175 For a square input image, think of drawing a circle of
176 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to 176 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to
183 The actual value is given by bilinear interpolation considering the pixels 183 The actual value is given by bilinear interpolation considering the pixels
184 around the (non-integer) source position thus found. 184 around the (non-integer) source position thus found.
185 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\ 185 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\
186 186
187 {\large\bf Injecting Noise}\\ 187 {\large\bf Injecting Noise}\\
188 {\bf Motion Blur}\\ 188 {\bf Motion Blur.}
189 This GIMP filter is a ``linear motion blur'' in GIMP 189 This GIMP filter is a ``linear motion blur'' in GIMP
190 terminology, with two parameters, $length$ and $angle$. The value of 190 terminology, with two parameters, $length$ and $angle$. The value of
191 a pixel in the final image is the approximately mean value of the $length$ first pixels 191 a pixel in the final image is the approximately mean value of the $length$ first pixels
192 found by moving in the $angle$ direction. 192 found by moving in the $angle$ direction.
193 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\ 193 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\
194 {\bf Occlusion}\\ 194 {\bf Occlusion.}
195 This filter selects a random rectangle from an {\em occluder} character 195 This filter selects a random rectangle from an {\em occluder} character
196 images and places it over the original {\em occluded} character 196 images and places it over the original {\em occluded} character
197 image. Pixels are combined by taking the max(occluder,occluded), 197 image. Pixels are combined by taking the max(occluder,occluded),
198 closer to black. The corners of the occluder The rectangle corners 198 closer to black. The corners of the occluder The rectangle corners
199 are sampled so that larger complexity gives larger rectangles. 199 are sampled so that larger complexity gives larger rectangles.
200 The destination position in the occluded image are also sampled 200 The destination position in the occluded image are also sampled
201 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}). 201 according to a normal distribution (see more details in~\citet{ift6266-tr-anonymous}).
202 It has has a probability of not being applied at all of 60\%.\\ 202 It has has a probability of not being applied at all of 60\%.\\
203 {\bf Pixel Permutation}\\ 203 {\bf Pixel Permutation.}
204 This filter permutes neighbouring pixels. It selects first 204 This filter permutes neighbouring pixels. It selects first
205 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then 205 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then
206 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number 206 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number
207 of exchanges to the left, right, top, bottom are equal or does not differ 207 of exchanges to the left, right, top, bottom are equal or does not differ
208 from more than 1 if the number of selected pixels is not a multiple of 4. 208 from more than 1 if the number of selected pixels is not a multiple of 4.
209 It has has a probability of not being applied at all of 80\%.\\ 209 It has has a probability of not being applied at all of 80\%.\\
210 {\bf Gaussian Noise}\\ 210 {\bf Gaussian Noise.}
211 This filter simply adds, to each pixel of the image independently, a 211 This filter simply adds, to each pixel of the image independently, a
212 noise $\sim Normal(0(\frac{complexity}{10})^2)$. 212 noise $\sim Normal(0(\frac{complexity}{10})^2)$.
213 It has has a probability of not being applied at all of 70\%.\\ 213 It has has a probability of not being applied at all of 70\%.\\
214 {\bf Background Images}\\ 214 {\bf Background Images.}
215 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random 215 Following~\citet{Larochelle-jmlr-2009}, this transformation adds a random
216 background behind the letter. The background is chosen by first selecting, 216 background behind the letter. The background is chosen by first selecting,
217 at random, an image from a set of images. Then a 32$\times$32 subregion 217 at random, an image from a set of images. Then a 32$\times$32 subregion
218 of that image is chosen as the background image (by sampling position 218 of that image is chosen as the background image (by sampling position
219 uniformly while making sure not to cross image borders). 219 uniformly while making sure not to cross image borders).
222 intensity) for both the original image and the background image, $maximage$ 222 intensity) for both the original image and the background image, $maximage$
223 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. 223 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$.
224 Each background pixel value is multiplied by $\frac{max(maximage - 224 Each background pixel value is multiplied by $\frac{max(maximage -
225 contrast, 0)}{maxbg}$ (higher contrast yield darker 225 contrast, 0)}{maxbg}$ (higher contrast yield darker
226 background). The output image pixels are max(background,original).\\ 226 background). The output image pixels are max(background,original).\\
227 {\bf Salt and Pepper Noise}\\ 227 {\bf Salt and Pepper Noise.}
228 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. 228 This filter adds noise $\sim U[0,1]$ to random subsets of pixels.
229 The number of selected pixels is $0.2 \times complexity$. 229 The number of selected pixels is $0.2 \times complexity$.
230 This filter has a probability of not being applied at all of 75\%.\\ 230 This filter has a probability of not being applied at all of 75\%.\\
231 {\bf Spatially Gaussian Noise}\\ 231 {\bf Spatially Gaussian Noise.}
232 Different regions of the image are spatially smoothed. 232 Different regions of the image are spatially smoothed.
233 The image is convolved with a symmetric Gaussian kernel of 233 The image is convolved with a symmetric Gaussian kernel of
234 size and variance choosen uniformly in the ranges $[12,12 + 20 \times 234 size and variance choosen uniformly in the ranges $[12,12 + 20 \times
235 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized 235 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized
236 between $0$ and $1$. We also create a symmetric averaging window, of the 236 between $0$ and $1$. We also create a symmetric averaging window, of the
240 initialize to zero a mask matrix of the image size. For each selected pixel 240 initialize to zero a mask matrix of the image size. For each selected pixel
241 we add to the mask the averaging window centered to it. The final image is 241 we add to the mask the averaging window centered to it. The final image is
242 computed from the following element-wise operation: $\frac{image + filtered 242 computed from the following element-wise operation: $\frac{image + filtered
243 image \times mask}{mask+1}$. 243 image \times mask}{mask+1}$.
244 This filter has a probability of not being applied at all of 75\%.\\ 244 This filter has a probability of not being applied at all of 75\%.\\
245 {\bf Scratches}\\ 245 {\bf Scratches.}
246 The scratches module places line-like white patches on the image. The 246 The scratches module places line-like white patches on the image. The
247 lines are heavily transformed images of the digit "1" (one), chosen 247 lines are heavily transformed images of the digit "1" (one), chosen
248 at random among five thousands such 1 images. The 1 image is 248 at random among five thousands such 1 images. The 1 image is
249 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times 249 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times
250 complexity)^2$, using bicubic interpolation, 250 complexity)^2$, using bicubic interpolation,
254 This filter is only applied only 15\% of the time. When it is applied, 50\% 254 This filter is only applied only 15\% of the time. When it is applied, 50\%
255 of the time, only one patch image is generated and applied. In 30\% of 255 of the time, only one patch image is generated and applied. In 30\% of
256 cases, two patches are generated, and otherwise three patches are 256 cases, two patches are generated, and otherwise three patches are
257 generated. The patch is applied by taking the maximal value on any given 257 generated. The patch is applied by taking the maximal value on any given
258 patch or the original image, for each of the 32x32 pixel locations.\\ 258 patch or the original image, for each of the 32x32 pixel locations.\\
259 {\bf Color and Contrast Changes}\\ 259 {\bf Color and Contrast Changes.}
260 This filter changes the constrast and may invert the image polarity (white 260 This filter changes the constrast and may invert the image polarity (white
261 on black to black on white). The contrast $C$ is defined here as the 261 on black to black on white). The contrast $C$ is defined here as the
262 difference between the maximum and the minimum pixel value of the image. 262 difference between the maximum and the minimum pixel value of the image.
263 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$). 263 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$).
264 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The 264 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The
358 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized 358 The MLP has a single hidden layer with $\tanh$ activation functions, and softmax (normalized
359 exponentials) on the output layer for estimating P(class | image). 359 exponentials) on the output layer for estimating P(class | image).
360 The hyper-parameters are the following: number of hidden units, taken in 360 The hyper-parameters are the following: number of hidden units, taken in
361 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training 361 $\{300,500,800,1000,1500\}$. The optimization procedure is as follows. Training
362 examples are presented in minibatches of size 20. A constant learning 362 examples are presented in minibatches of size 20. A constant learning
363 rate is chosen in $\{10^{-6},10^{-5},10^{-4},10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$ 363 rate is chosen in $10^{-3},0.01, 0.025, 0.075, 0.1, 0.5\}$
364 through preliminary experiments, and 0.1 was selected. 364 through preliminary experiments, and 0.1 was selected.
365 365
366 366
367 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)} 367 \subsubsection{Stacked Denoising Auto-Encoders (SDAE)}
368 \label{SdA} 368 \label{SdA}