Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

Date

2019-05-01

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Data sets for visual anomaly detection are often stratified such that every stratum or batch in the data set suffers from imbalance of different magnitude. A common approach to this detection task is to use supervised inductive learning from labeled or partially labeled image data to simultaneously solve the task of segmenting the anomaly and classifying it. Many representations and algorithms for these learning tasks exhibit some preference (inductive bias) towards balanced data from each class and thus perform better with balanced data sets than imbalanced. Such representations and algorithms are sensitive to not only the aggregate degree of class imbalance but its within-stratum variation. This includes learning representations such as deep learning for intermediate visual features.

Several oversampling-based techniques have been proposed to mitigate the skewness of the data. However, most of the synthetic oversampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE) or Adaptive Synthetic Sampling (ADASYN) are suitable only for the low dimensional data which limits their application in visual anomaly detection. Recently, deep generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN) have been established as effective approaches to augment high-dimensional image data. However, the literature lacks a detailed study of the learning process in a data set augmented to cope with variable imbalance across strata. We carried out an experiment to analyze the training phase and the final classifier performance when the more imbalanced batch is augmented using different approaches to achieve the same data ratio as the less imbalanced batch. We identified the classification on merged batches as baseline and compared the performance of the classifier on data sets augmented by simple oversampling, an adaptation of SMOTE, and a GAN-based generative model. Our results indicate that the GAN-based augmentation is capable of avoiding overfitting and leads to better performance.

Description

Keywords

Variability of class imbalance, Sampling versus generative, Data augmentation, Visual anomaly detection, Generative adversarial network, Over-sampling and under-sampling

Graduation Month

May

Degree

Master of Science

Department

Department of Computer Science

Major Professor

William H. Hsu

Date

2019

Type

Thesis

Citation