CLASS IMBALANCE AND ACTIVE LEARNING

2013-01-01
Attenberg, Josh
Ertekin Bolelli, Şeyda
The performance of a predictive model is tightly coupled with the data used during training. While using more examples in the training will often result in a better informed, more accurate model; limits on computer memory and real-world costs associated with gathering labeled examples often constrain the amount of data that can be used for training. In settings where the number of training examples is limited, it often becomes meaningful to carefully see just which examples are selected. In active learning (AL), the model itself plays a hands-on role in the selection of examples for labeling from a large pool of unlabeled examples. These examples are used for model training. Numerous studies have demonstrated, both empirically and theoretically, the benefits of AL: Given a fixed budget, a training system that interactively involves the current model in selecting the training examples can often result in a far greater accuracy than a system that simply selects random training examples. Imbalanced settings provide special opportunities and challenges for AL. For example, while AL can be used to build models that counteract the harmful effects of learning under class imbalance, extreme class imbalance can cause an AL strategy to "fail," preventing the selection scheme from choosing any useful examples for labeling. This chapter focuses on the interaction between AL and class imbalance, discussing (i) AL techniques designed specifically for dealing with imbalanced settings, (ii) strategies that leverage AL to overcome the deleterious effects of class imbalance, (iii) how extreme class imbalance can prevent AL systems from selecting useful examples, and alternatives to AL in these cases.
IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS

Suggestions

Improvement of Hyperspectral Classification Accuracy with Limited Training Data Using Meanshift Segmentation
Özdemir, Okan Bilge; Çetin, Yasemin (2014-04-25)
In this study, the performance of hyperspectral classification algorithms with limited training data investigated. Support Vector Machines (SVM) with Gaussian kernel is used. Principle Component Analysis (PCA) is employed for preprocessing and meanshift segmentation is used to incorporate spatial information with spectral information to observe the effect spatial information. Pattern search algorithm is used to optimize meanshift segmentation parameters. The performance of the algorithm is demonstrated on h...
Framing Effects On New Generation Financial Statements.
Gönül, Mustafa Sinan; Muğan, Fatma Naciye Can; Akman, Nazlı(2013-12-31)
We will investigate the perceptions about the new format of the financial statements and evaluation of it by both expert and novice groups will also be gathered. This will provide a glimpse on how easy/difficult the adoption of this new standard will be when it will be forced into implementation. To investigate the effects of the aforementioned factors on the perceptions and judgments of the financial statement users, an experiment having a 2x2x4 factorial design will be employed. The independent variables ...
A Similarity Based Oversampling Method for Multi-Label Imbalanced Text Data
Karaman, İsmail Hakkı; Köksal, Gülser; Erişkin, Levent; Department of Industrial Engineering (2022-9-1)
In the real world, while the amount of data increases, it is not easy to find labeled data for Machine Learning projects, because of the compelling cost and effort requirements for labeling data. Also, most Machine Learning projects, especially multi-label classification problems, struggle with the data imbalance problem. In these problems, some classes, even, do not have enough data to train a classifier. In this study, an over sampling method for multi-label text classification problems is developed and s...
Domain Adaptation on Graphs via Frequency Analysis
Pilancı, Mehmet; Vural, Elif (2019-08-22)
Classical machine learning algorithms assume the training and test data to be sampled from the same distribution, while this assumption may be violated in practice. Domain adaptation methods aim to exploit the information available in a source domain in order to improve the performance of classification in a target domain. In this work, we focus on the problem of domain adaptation in graph settings. We consider a source graph with many labeled nodes and aim to estimate the class labels on a target graph wit...
Feature Extraction and Object Classification for Target Identification at Wireless Multimedia Sensor Networks
Civelek, Muhsin; Yilmazer, Cengiz; Yazıcı, Adnan; Korkut, Fazli Oncul (2014-04-25)
In this paper, it is investigated the processes for automatic identification of the targets without personnel intervention in wireless multimedia sensor networks. Methods to extract the features of the object from the multimedia data and to classify the target type based on the extracted features are proposed within the scope of this study. The success of the proposed methods are tested by implementing a Matlab application and the results are presented in this paper
Citation Formats
J. Attenberg and Ş. Ertekin Bolelli, “CLASS IMBALANCE AND ACTIVE LEARNING,” IMBALANCED LEARNING: FOUNDATIONS, ALGORITHMS, AND APPLICATIONS, pp. 101–149, 2013, Accessed: 00, 2020. [Online]. Available: https://hdl.handle.net/11511/53216.