Mercurial > ift6266
comparison writeup/nips2010_submission.tex @ 467:e0e57270b2af
refs
author | Yoshua Bengio <bengioy@iro.umontreal.ca> |
---|---|
date | Sat, 29 May 2010 16:50:03 -0400 |
parents | 6205481bf33f |
children | d02d288257bf |
comparison
equal
deleted
inserted
replaced
466:6205481bf33f | 467:e0e57270b2af |
---|---|
106 \end{enumerate} | 106 \end{enumerate} |
107 The experimental results presented here provide positive evidence towards all of these questions. | 107 The experimental results presented here provide positive evidence towards all of these questions. |
108 | 108 |
109 \section{Perturbation and Transformation of Character Images} | 109 \section{Perturbation and Transformation of Character Images} |
110 | 110 |
111 This section describes the different transformations we used to generate data, in their order. | 111 This section describes the different transformations we used to stochastically |
112 transform source images in order to obtain data. More details can | |
113 be found in this technical report~\cite{ift6266-tr-anonymous}. | |
112 The code for these transformations (mostly python) is available at | 114 The code for these transformations (mostly python) is available at |
113 {\tt http://anonymous.url.net}. All the modules in the pipeline share | 115 {\tt http://anonymous.url.net}. All the modules in the pipeline share |
114 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the | 116 a global control parameter ($0 \le complexity \le 1$) that allows one to modulate the |
115 amount of deformation or noise introduced. | 117 amount of deformation or noise introduced. |
116 | 118 |
117 We can differentiate two important parts in the pipeline. The first one, | 119 There are two main parts in the pipeline. The first one, |
118 from slant to pinch, performs transformations of the character. The second | 120 from slant to pinch below, performs transformations. The second |
119 part, from blur to contrast, adds noise to the image. | 121 part, from blur to contrast, adds different kinds of noise. |
120 | 122 |
121 \subsection{Slant} | 123 {\large\bf Transformations}\\ |
122 | 124 {\bf Slant}\\ |
123 In order to mimic a slant effect, we simply shift each row of the image | 125 We mimic slant by shifting each row of the image |
124 proportionnaly to its height: $shift = round(slant \times height)$. We | 126 proportionnaly to its height: $shift = round(slant \times height)$. |
125 round the shift in order to have a discret displacement. We do not use a | |
126 filter to smooth the result in order to save computing time and also | |
127 because latter transformations have similar effects. | |
128 | |
129 The $slant$ coefficient can be negative or positive with equal probability | 127 The $slant$ coefficient can be negative or positive with equal probability |
130 and its value is randomly sampled according to the complexity level. In | 128 and its value is randomly sampled according to the complexity level: |
131 our case we take uniformly a number in the range $[0,complexity]$, so the | 129 e $slant \sim U[0,complexity]$, so the |
132 maximum displacement for the lowest or highest pixel line is of | 130 maximum displacement for the lowest or highest pixel line is of |
133 $round(complexity \times 32)$. | 131 $round(complexity \times 32)$.\\ |
134 | 132 {\bf Thickness}\\ |
135 | 133 Morpholigical operators of dilation and erosion~\cite{Haralick87,Serra82} |
136 \subsection{Thickness} | 134 are applied. The neighborhood of each pixel is multiplied |
137 | 135 element-wise with a {\em structuring element} matrix. |
138 To change the thickness of the characters we used morpholigical operators: | 136 The pixel value is replaced by the maximum or the minimum of the resulting |
139 dilation and erosion~\cite{Haralick87,Serra82}. | 137 matrix, respectively for dilation or erosion. Ten different structural elements with |
140 | 138 increasing dimensions (largest is $5\times5$) were used. For each image, |
141 The basic idea of such transform is, for each pixel, to multiply in the | 139 randomly sample the operator type (dilation or erosion) with equal probability and one structural |
142 element-wise manner its neighbourhood with a matrix called the structuring | |
143 element. Then for dilation we remplace the pixel value by the maximum of | |
144 the result, or the minimum for erosion. This will dilate or erode objects | |
145 in the image and the strength of the transform only depends on the | |
146 structuring element. | |
147 | |
148 We used ten different structural elements with increasing dimensions (the | |
149 biggest is $5\times5$). for each image, we radomly sample the operator | |
150 type (dilation or erosion) with equal probability and one structural | |
151 element from a subset of the $n$ smallest structuring elements where $n$ is | 140 element from a subset of the $n$ smallest structuring elements where $n$ is |
152 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ | 141 $round(10 \times complexity)$ for dilation and $round(6 \times complexity)$ |
153 for erosion. A neutral element is always present in the set, if it is | 142 for erosion. A neutral element is always present in the set, and if it is |
154 chosen the transformation is not applied. Erosion allows only the six | 143 chosen no transformation is applied. Erosion allows only the six |
155 smallest structural elements because when the character is too thin it may | 144 smallest structural elements because when the character is too thin it may |
156 erase it completly. | 145 be completely erased.\\ |
157 | 146 {\bf Affine Transformations}\\ |
158 \subsection{Affine Transformations} | 147 A $2 \times 3$ affine transform matrix (with |
159 | 148 6 parameters $(a,b,c,d,e,f)$) is sampled according to the $complexity$ level. |
160 We generate an affine transform matrix according to the complexity level, | 149 Each pixel $(x,y)$ of the output image takes the value of the pixel |
161 then we apply it directly to the image. The matrix is of size $2 \times | 150 nearest to $(ax+by+c,dx+ey+f)$ in the input image. This |
162 3$, so we can represent it by six parameters $(a,b,c,d,e,f)$. Formally, | 151 produces scaling, translation, rotation and shearing. |
163 for each pixel $(x,y)$ of the output image, we give the value of the pixel | 152 The marginal distributions of $(a,b,c,d,e,f)$ have been tuned by hand to |
164 nearest to : $(ax+by+c,dx+ey+f)$, in the input image. This allows to | |
165 produce scaling, translation, rotation and shearing variances. | |
166 | |
167 The sampling of the parameters $(a,b,c,d,e,f)$ have been tuned by hand to | |
168 forbid important rotations (not to confuse classes) but to give good | 153 forbid important rotations (not to confuse classes) but to give good |
169 variability of the transformation. For each image we sample uniformly the | 154 variability of the transformation: $a$ and $d$ $\sim U[1-3 \times |
170 parameters in the following ranges: $a$ and $d$ in $[1-3 \times | 155 complexity,1+3 \times complexity]$, $b$ and $e$ $\sim[-3 \times complexity,3 |
171 complexity,1+3 \times complexity]$, $b$ and $e$ in $[-3 \times complexity,3 | 156 \times complexity]$ and $c$ and $f$ $\sim U[-4 \times complexity, 4 \times |
172 \times complexity]$ and $c$ and $f$ in $[-4 \times complexity, 4 \times | 157 complexity]$.\\ |
173 complexity]$. | 158 {\bf Local Elastic Deformations}\\ |
174 | 159 This filter induces a "wiggly" effect in the image, following~\cite{SimardSP03}, |
175 | 160 which provides more details. |
176 \subsection{Local Elastic Deformations} | 161 Two "displacements" fields are generated and applied, for horizontal |
177 | 162 and vertical displacements of pixels. |
178 This filter induces a "wiggly" effect in the image. The description here | |
179 will be brief, as the algorithm follows precisely what is described in | |
180 \cite{SimardSP03}. | |
181 | |
182 The general idea is to generate two "displacements" fields, for horizontal | |
183 and vertical displacements of pixels. Each of these fields has the same | |
184 size as the original image. | |
185 | |
186 When generating the transformed image, we'll loop over the x and y | |
187 positions in the fields and select, as a value, the value of the pixel in | |
188 the original image at the (relative) position given by the displacement | |
189 fields for this x and y. If the position we'd retrieve is outside the | |
190 borders of the image, we use a 0 value instead. | |
191 | |
192 To generate a pixel in either field, first a value between -1 and 1 is | 163 To generate a pixel in either field, first a value between -1 and 1 is |
193 chosen from a uniform distribution. Then all the pixels, in both fields, is | 164 chosen from a uniform distribution. Then all the pixels, in both fields, are |
194 multiplied by a constant $\alpha$ which controls the intensity of the | 165 multiplied by a constant $\alpha$ which controls the intensity of the |
195 displacements (bigger $\alpha$ translates into larger wiggles). | 166 displacements (larger $\alpha$ translates into larger wiggles). |
196 | 167 Each field is convoluted with a Gaussian 2D kernel of |
197 As a final step, each field is convoluted with a Gaussian 2D kernel of | 168 standard deviation $\sigma$. Visually, this results in a blur. |
198 standard deviation $\sigma$. Visually, this results in a "blur" | |
199 filter. This has the effect of making values next to each other in the | |
200 displacement fields similar. In effect, this makes the wiggles more | |
201 coherent, less noisy. | |
202 | |
203 As displacement fields were long to compute, 50 pairs of fields were | |
204 generated per complexity in increments of 0.1 (50 pairs for 0.1, 50 pairs | |
205 for 0.2, etc.), and afterwards, given a complexity, we selected randomly | |
206 among the 50 corresponding pairs. | |
207 | |
208 $\sigma$ and $\alpha$ were linked to complexity through the formulas | |
209 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times | 169 $\alpha = \sqrt[3]{complexity} \times 10.0$ and $\sigma = 10 - 7 \times |
210 \sqrt[3]{complexity}$. | 170 \sqrt[3]{complexity}$.\\ |
211 | 171 {\bf Pinch}\\ |
212 | 172 This GIMP filter is named "Whirl and |
213 \subsection{Pinch} | 173 pinch", but whirl was set to 0. A pinch is ``similar to projecting the image onto an elastic |
214 | 174 surface and pressing or pulling on the center of the surface''~\cite{GIMP-manual}. |
215 This is another GIMP filter we used. The filter is in fact named "Whirl and | 175 For a square input image, think of drawing a circle of |
216 pinch", but we don't use the "whirl" part (whirl is set to 0). As described | |
217 in GIMP, a pinch is "similar to projecting the image onto an elastic | |
218 surface and pressing or pulling on the center of the surface". | |
219 | |
220 Mathematically, for a square input image, think of drawing a circle of | |
221 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to | 176 radius $r$ around a center point $C$. Any point (pixel) $P$ belonging to |
222 that disk (region inside circle) will have its value recalculated by taking | 177 that disk (region inside circle) will have its value recalculated by taking |
223 the value of another "source" pixel in the original image. The position of | 178 the value of another "source" pixel in the original image. The position of |
224 that source pixel is found on the line thats goes through $C$ and $P$, but | 179 that source pixel is found on the line thats goes through $C$ and $P$, but |
225 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ | 180 at some other distance $d_2$. Define $d_1$ to be the distance between $P$ |
226 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times | 181 and $C$. $d_2$ is given by $d_2 = sin(\frac{\pi{}d_1}{2r})^{-pinch} \times |
227 d_1$, where $pinch$ is a parameter to the filter. | 182 d_1$, where $pinch$ is a parameter to the filter. |
228 | |
229 If the region considered is not square then, before computing $d_2$, the | |
230 smallest dimension (x or y) is stretched such that we may consider the | |
231 region as if it was square. Then, after $d_2$ has been computed and | |
232 corresponding components $d_2\_x$ and $d_2\_y$ have been found, the | |
233 component corresponding to the stretched dimension is compressed back by an | |
234 inverse ratio. | |
235 | |
236 The actual value is given by bilinear interpolation considering the pixels | 183 The actual value is given by bilinear interpolation considering the pixels |
237 around the (non-integer) source position thus found. | 184 around the (non-integer) source position thus found. |
238 | 185 Here $pinch \sim U[-complexity, 0.7 \times complexity]$.\\ |
239 The value for $pinch$ in our case was given by sampling from an uniform | 186 |
240 distribution over the range $[-complexity, 0.7 \times complexity]$. | 187 {\large\bf Injecting Noise}\\ |
241 | 188 {\bf Motion Blur}\\ |
242 \subsection{Motion Blur} | 189 This GIMP filter is a ``linear motion blur'' in GIMP |
243 | 190 terminology, with two parameters, $length$ and $angle$. The value of |
244 This is a GIMP filter we applied, a "linear motion blur" in GIMP | 191 a pixel in the final image is the approximately mean value of the $length$ first pixels |
245 terminology. The description will be brief as it is a well-known filter. | 192 found by moving in the $angle$ direction. |
246 | 193 Here $angle \sim U[0,360]$ degrees, and $length \sim {\rm Normal}(0,(3 \times complexity)^2)$.\\ |
247 This algorithm has two input parameters, $length$ and $angle$. The value of | 194 {\bf Occlusion}\\ |
248 a pixel in the final image is the mean value of the $length$ first pixels | 195 This filter selects a random rectangle from an {\em occluder} character |
249 found by moving in the $angle$ direction. An approximation of this idea is | 196 images and places it over the original {\em occluded} character |
250 used, as we won't fall onto precise pixels by following that | 197 image. Pixels are combined by taking the max(occluder,occluded), |
251 direction. This is done using the Bresenham line algorithm. | 198 closer to black. The corners of the occluder The rectangle corners |
252 | 199 are sampled so that larger complexity gives larger rectangles. |
253 The angle, in our case, is chosen from a uniform distribution over | 200 The destination position in the occluded image are also sampled |
254 $[0,360]$ degrees. The length, though, depends on the complexity; it's | 201 according to a normal distribution (see more details in~\cite{ift6266-tr-anonymous}. |
255 sampled from a Gaussian distribution of mean 0 and standard deviation | 202 It has has a probability of not being applied at all of 60\%.\\ |
256 $\sigma = 3 \times complexity$. | 203 {\bf Pixel Permutation}\\ |
257 | 204 This filter permutes neighbouring pixels. It selects first |
258 \subsection{Occlusion} | |
259 | |
260 This filter selects random parts of other (hereafter "occlusive") letter | |
261 images and places them over the original letter (hereafter "occluded") | |
262 image. To be more precise, having selected a subregion of the occlusive | |
263 image and a desination position in the occluded image, to determine the | |
264 final value for a given overlapping pixel, it selects whichever pixel is | |
265 the lightest. As a reminder, the background value is 0, black, so the value | |
266 nearest to 1 is selected. | |
267 | |
268 To select a subpart of the occlusive image, four numbers are generated. For | |
269 compability with the code, we'll call them "haut", "bas", "gauche" and | |
270 "droite" (respectively meaning top, bottom, left and right). Each of these | |
271 numbers is selected according to a Gaussian distribution of mean $8 \times | |
272 complexity$ and standard deviation $2$. This means the largest the | |
273 complexity is, the biggest the occlusion will be. The absolute value is | |
274 taken, as the numbers must be positive, and the maximum value is capped at | |
275 15. | |
276 | |
277 These four sizes collectively define a window centered on the middle pixel | |
278 of the occlusive image. This is the part that will be extracted as the | |
279 occlusion. | |
280 | |
281 The next step is to select a destination position in the occluded | |
282 image. Vertical and horizontal displacements $y\_arrivee$ and $x\_arrivee$ | |
283 are selected according to Gaussian distributions of mean 0 and of standard | |
284 deviations of, respectively, 3 and 2. Then an horizontal placement mode, | |
285 $place$, is selected to be of three values meaning | |
286 left, middle or right. | |
287 | |
288 If $place$ is "middle", the occlusion will be horizontally centered | |
289 around the horizontal middle of the occluded image, then shifted according | |
290 to $x\_arrivee$. If $place$ is "left", it will be placed on the left of | |
291 the occluded image, then displaced right according to $x\_arrivee$. The | |
292 contrary happens if $place$ is $right$. | |
293 | |
294 In both the horizontal and vertical positionning, the maximum position in | |
295 either direction is such that the selected occlusion won't go beyond the | |
296 borders of the occluded image. | |
297 | |
298 This filter has a probability of not being applied, at all, of 60\%. | |
299 | |
300 | |
301 \subsection{Pixel Permutation} | |
302 | |
303 This filter permuts neighbouring pixels. It selects first | |
304 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then | 205 $\frac{complexity}{3}$ pixels randomly in the image. Each of them are then |
305 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number | 206 sequentially exchanged to one other pixel in its $V4$ neighbourhood. Number |
306 of exchanges to the left, right, top, bottom are equal or does not differ | 207 of exchanges to the left, right, top, bottom are equal or does not differ |
307 from more than 1 if the number of selected pixels is not a multiple of 4. | 208 from more than 1 if the number of selected pixels is not a multiple of 4. |
308 | 209 It has has a probability of not being applied at all of 80\%.\\ |
309 It has has a probability of not being applied, at all, of 80\%. | 210 {\bf Gaussian Noise}\\ |
310 | |
311 | |
312 \subsection{Gaussian Noise} | |
313 | |
314 This filter simply adds, to each pixel of the image independently, a | 211 This filter simply adds, to each pixel of the image independently, a |
315 Gaussian noise of mean $0$ and standard deviation $\frac{complexity}{10}$. | 212 noise $\sim Normal(0(\frac{complexity}{10})^2)$. |
316 | 213 It has has a probability of not being applied at all of 70\%.\\ |
317 It has has a probability of not being applied, at all, of 70\%. | 214 {\bf Background Images}\\ |
318 | |
319 | |
320 \subsection{Background Images} | |
321 | |
322 Following~\cite{Larochelle-jmlr-2009}, this transformation adds a random | 215 Following~\cite{Larochelle-jmlr-2009}, this transformation adds a random |
323 background behind the letter. The background is chosen by first selecting, | 216 background behind the letter. The background is chosen by first selecting, |
324 at random, an image from a set of images. Then we choose a 32x32 subregion | 217 at random, an image from a set of images. Then a 32$\times$32 subregion |
325 of that image as the background image (by sampling x and y positions | 218 of that image is chosen as the background image (by sampling position |
326 uniformly while making sure not to cross image borders). | 219 uniformly while making sure not to cross image borders). |
327 | |
328 To combine the original letter image and the background image, contrast | 220 To combine the original letter image and the background image, contrast |
329 adjustments are made. We first get the maximal values (i.e. maximal | 221 adjustments are made. We first get the maximal values (i.e. maximal |
330 intensity) for both the original image and the background image, $maximage$ | 222 intensity) for both the original image and the background image, $maximage$ |
331 and $maxbg$. We also have a parameter, $contrast$, given by sampling from a | 223 and $maxbg$. We also have a parameter $contrast \sim U[complexity, 1]$. |
332 uniform distribution over $[complexity, 1]$. | 224 Each background pixel value is multiplied by $\frac{max(maximage - |
333 | 225 contrast, 0)}{maxbg}$ (higher contrast yield darker |
334 Once we have all these numbers, we first adjust the values for the | 226 background). The output image pixels are max(background,original).\\ |
335 background image. Each pixel value is multiplied by $\frac{max(maximage - | 227 {\bf Salt and Pepper Noise}\\ |
336 contrast, 0)}{maxbg}$. Therefore the higher the contrast, the darkest the | 228 This filter adds noise $\sim U[0,1]$ to random subsets of pixels. |
337 background will be. | 229 The number of selected pixels is $0.2 \times complexity$. |
338 | 230 This filter has a probability of not being applied at all of 75\%.\\ |
339 The final image is found by taking the brightest (i.e. value nearest to 1) | 231 {\bf Spatially Gaussian Noise}\\ |
340 pixel from either the background image or the corresponding pixel in the | 232 Different regions of the image are spatially smoothed. |
341 original image. | 233 The image is convolved with a symmetric Gaussian kernel of |
342 | 234 size and variance choosen uniformly in the ranges $[12,12 + 20 \times |
343 \subsection{Salt and Pepper Noise} | |
344 | |
345 This filter adds noise to the image by randomly selecting a certain number | |
346 of them and, for those selected pixels, assign a random value according to | |
347 a uniform distribution over the $[0,1]$ ranges. This last distribution does | |
348 not change according to complexity. Instead, the number of selected pixels | |
349 does: the proportion of changed pixels corresponds to $complexity / 5$, | |
350 which means, as a maximum, 20\% of the pixels will be randomized. On the | |
351 lowest extreme, no pixel is changed. | |
352 | |
353 This filter also has a probability of not being applied, at all, of 75\%. | |
354 | |
355 \subsection{Spatially Gaussian Noise} | |
356 | |
357 The aim of this transformation is to filter, with a gaussian kernel, | |
358 different regions of the image. In order to save computing time we decided | |
359 to convolve the whole image only once with a symmetric gaussian kernel of | |
360 size and variance choosen uniformly in the ranges: $[12,12 + 20 \times | |
361 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized | 235 complexity]$ and $[2,2 + 6 \times complexity]$. The result is normalized |
362 between $0$ and $1$. We also create a symmetric averaging window, of the | 236 between $0$ and $1$. We also create a symmetric averaging window, of the |
363 kernel size, with maximum value at the center. For each image we sample | 237 kernel size, with maximum value at the center. For each image we sample |
364 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be | 238 uniformly from $3$ to $3 + 10 \times complexity$ pixels that will be |
365 averaging centers between the original image and the filtered one. We | 239 averaging centers between the original image and the filtered one. We |
366 initialize to zero a mask matrix of the image size. For each selected pixel | 240 initialize to zero a mask matrix of the image size. For each selected pixel |
367 we add to the mask the averaging window centered to it. The final image is | 241 we add to the mask the averaging window centered to it. The final image is |
368 computed from the following element-wise operation: $\frac{image + filtered | 242 computed from the following element-wise operation: $\frac{image + filtered |
369 image \times mask}{mask+1}$. | 243 image \times mask}{mask+1}$. |
370 | 244 This filter has a probability of not being applied at all of 75\%.\\ |
371 This filter has a probability of not being applied, at all, of 75\%. | 245 {\bf Scratches}\\ |
372 | |
373 \subsection{Scratches} | |
374 | |
375 The scratches module places line-like white patches on the image. The | 246 The scratches module places line-like white patches on the image. The |
376 lines are in fact heavily transformed images of the digit "1" (one), chosen | 247 lines are heavily transformed images of the digit "1" (one), chosen |
377 at random among five thousands such start images of this digit. | 248 at random among five thousands such 1 images. The 1 image is |
378 | 249 randomly cropped and rotated by an angle $\sim Normal(0,(100 \times |
379 Once the image is selected, the transformation begins by finding the first | 250 complexity)^2$, using bicubic interpolation, |
380 $top$, $bottom$, $right$ and $left$ non-zero pixels in the image. It is | 251 Two passes of a greyscale morphological erosion filter |
381 then cropped to the region thus delimited, then this cropped version is | 252 are applied, reducing the width of the line |
382 expanded to $32\times32$ again. It is then rotated by a random angle having a | 253 by an amount controlled by $complexity$. |
383 Gaussian distribution of mean 90 and standard deviation $100 \times | |
384 complexity$ (in degrees). The rotation is done with bicubic interpolation. | |
385 | |
386 The rotated image is then resized to $50\times50$, with anti-aliasing. In | |
387 that image, we crop the image again by selecting a region delimited | |
388 horizontally to $left$ to $left+32$ and vertically by $top$ to $top+32$. | |
389 | |
390 Once this is done, two passes of a greyscale morphological erosion filter | |
391 are applied. Put briefly, this erosion filter reduces the width of the line | |
392 by a certain $smoothing$ amount. For small complexities (< 0.5), | |
393 $smoothing$ is 6, so the line is very small. For complexities ranging from | |
394 0.25 to 0.5, $smoothing$ is 5. It is 4 for complexities 0.5 to 0.75, and 3 | |
395 for higher complexities. | |
396 | |
397 To compensate for border effects, the image is then cropped to 28x28 by | |
398 removing two pixels everywhere on the borders, then expanded to 32x32 | |
399 again. The pixel values are then linearly expanded such that the minimum | |
400 value is 0 and the maximal one is 1. Then, 50\% of the time, the image is | |
401 vertically flipped. | |
402 | |
403 This filter is only applied only 15\% of the time. When it is applied, 50\% | 254 This filter is only applied only 15\% of the time. When it is applied, 50\% |
404 of the time, only one patch image is generated and applied. In 30\% of | 255 of the time, only one patch image is generated and applied. In 30\% of |
405 cases, two patches are generated, and otherwise three patches are | 256 cases, two patches are generated, and otherwise three patches are |
406 generated. The patch is applied by taking the maximal value on any given | 257 generated. The patch is applied by taking the maximal value on any given |
407 patch or the original image, for each of the 32x32 pixel locations. | 258 patch or the original image, for each of the 32x32 pixel locations.\\ |
408 | 259 {\bf Color and Contrast Changes}\\ |
409 \subsection{Color and Contrast Changes} | |
410 | |
411 This filter changes the constrast and may invert the image polarity (white | 260 This filter changes the constrast and may invert the image polarity (white |
412 on black to black on white). The contrast $C$ is defined here as the | 261 on black to black on white). The contrast $C$ is defined here as the |
413 difference between the maximum and the minimum pixel value of the image. A | 262 difference between the maximum and the minimum pixel value of the image. |
414 contrast value is sampled uniformly between $1$ and $1-0.85 \times | 263 Contrast $\sim U[1-0.85 \times complexity,1]$ (so constrast $\geq 0.15$). |
415 complexity$ (this insure a minimum constrast of $0.15$). We then simply | 264 The image is normalized into $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The |
416 normalize the image to the range $[\frac{1-C}{2},1-\frac{1-C}{2}]$. The | |
417 polarity is inverted with $0.5$ probability. | 265 polarity is inverted with $0.5$ probability. |
418 | 266 |
419 | 267 |
420 \begin{figure}[h] | 268 \begin{figure}[h] |
421 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ | 269 \resizebox{.99\textwidth}{!}{\includegraphics{images/example_t.png}}\\ |
558 SDA1 & 17.1\% $\pm$.13\% & 29.7\%$\pm$.3\% & 29.7\%$\pm$.3\% & 1.4\% $\pm$.1\%\\ \hline | 406 SDA1 & 17.1\% $\pm$.13\% & 29.7\%$\pm$.3\% & 29.7\%$\pm$.3\% & 1.4\% $\pm$.1\%\\ \hline |
559 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline | 407 SDA2 & 18.7\% $\pm$.13\% & 33.6\%$\pm$.3\% & 39.9\%$\pm$.17\% & 1.7\% $\pm$.1\%\\ \hline |
560 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline | 408 MLP0 & 24.2\% $\pm$.15\% & 68.8\%$\pm$.33\% & 78.70\%$\pm$.14\% & 3.45\% $\pm$.15\% \\ \hline |
561 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline | 409 MLP1 & 23.0\% $\pm$.15\% & 41.8\%$\pm$.35\% & 90.4\%$\pm$.1\% & 3.85\% $\pm$.16\% \\ \hline |
562 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline | 410 MLP2 & 24.3\% $\pm$.15\% & 46.0\%$\pm$.35\% & 54.7\%$\pm$.17\% & 4.85\% $\pm$.18\% \\ \hline |
563 [5] & & & & 4.95\% $\pm$.18\% \\ \hline | 411 \cite{Granger+al-2007} & & & & 4.95\% $\pm$.18\% \\ \hline |
564 [2] & & & & 3.71\% $\pm$.16\% \\ \hline | 412 \cite{Cortes+al-2000} & & & & 3.71\% $\pm$.16\% \\ \hline |
565 [3] & & & & 2.4\% $\pm$.13\% \\ \hline | 413 \cite{Oliveira+al-2002} & & & & 2.4\% $\pm$.13\% \\ \hline |
566 [4] & & & & 2.1\% $\pm$.12\% \\ \hline | 414 \cite{Migram+al-2005} & & & & 2.1\% $\pm$.12\% \\ \hline |
567 \end{tabular} | 415 \end{tabular} |
568 \end{center} | 416 \end{center} |
569 \end{table} | 417 \end{table} |
570 | 418 |
571 \subsection{Perturbed Training Data More Helpful for SDAE} | 419 \subsection{Perturbed Training Data More Helpful for SDAE} |