Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

Nafi, Nasik Muhammad

Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

Files

NasikMuhammadNafi2019.pdf (13.35 MB)

Date

2019-05-01

Authors

Nafi, Nasik Muhammad

Abstract

Data sets for visual anomaly detection are often stratified such that every stratum or batch in the data set suffers from imbalance of different magnitude. A common approach to this detection task is to use supervised inductive learning from labeled or partially labeled image data to simultaneously solve the task of segmenting the anomaly and classifying it. Many representations and algorithms for these learning tasks exhibit some preference (inductive bias) towards balanced data from each class and thus perform better with balanced data sets than imbalanced. Such representations and algorithms are sensitive to not only the aggregate degree of class imbalance but its within-stratum variation. This includes learning representations such as deep learning for intermediate visual features.

Several oversampling-based techniques have been proposed to mitigate the skewness of the data. However, most of the synthetic oversampling techniques such as Synthetic Minority Over-sampling Technique (SMOTE) or Adaptive Synthetic Sampling (ADASYN) are suitable only for the low dimensional data which limits their application in visual anomaly detection. Recently, deep generative models such as Variational Autoencoders (VAE) or Generative Adversarial Networks (GAN) have been established as effective approaches to augment high-dimensional image data. However, the literature lacks a detailed study of the learning process in a data set augmented to cope with variable imbalance across strata. We carried out an experiment to analyze the training phase and the final classifier performance when the more imbalanced batch is augmented using different approaches to achieve the same data ratio as the less imbalanced batch. We identified the classification on merged batches as baseline and compared the performance of the classifier on data sets augmented by simple oversampling, an adaptation of SMOTE, and a GAN-based generative model. Our results indicate that the GAN-based augmentation is capable of avoiding overfitting and leads to better performance.

Keywords

Variability of class imbalance, Sampling versus generative, Data augmentation, Visual anomaly detection, Generative adversarial network, Over-sampling and under-sampling

Graduation Month

May

Degree

Master of Science

Department

Department of Computer Science

Major Professor

William H. Hsu

Date

2019

Type

Thesis

URI

http://hdl.handle.net/2097/39692

Collections

K-State Electronic Theses, Dissertations, and Reports: 2004 -

Full item page

Generative versus sampling-based approaches to variability of class imbalance in visual anomaly detection

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Graduation Month

Degree

Department

Major Professor

Date

Type

Citation

URI

Collections