Experimental design under comparisons

Guo, Yuan

Permanent URL: http://hdl.handle.net/2047/D20327424

Title:

Experimental design under comparisons

Creator:

Guo, Yuan (Author)

Contributor:

Ioannidis, Stratis (Advisor)
Dy, Jennifer (Committee member)
Erdogmus, Deniz (Committee member)

Language:

English

Publisher:

Boston, Massachusetts : Northeastern University, 2019

Date Accepted:

August 2019

Date Awarded:

August 2019

Type of resource:

Text

Genre:

Dissertations

Format:

electronic

Digital origin:

born digital

Abstract/Description:

Labels generated by human experts via comparisons exhibit smaller variance compared to traditional sample labels. Collecting comparison labels is challenging over large datasets, as the number of comparisons grows quadratically with the dataset size. We study the following experimen- tal design problem: given a budget of expert comparisons, and a set of existing sample labels, we determine the comparison labels to collect that lead to the highest classification improvement. We study several experimental design objectives motivated by the Bradley-Terry model. The resulting optimization problems amount to maximizing submodular functions. We especially study a natural experimental design objective, namely, D-optimality. This objective is known to perform well in practice, and is submodular, making the selection approximable via the greedy algorithm. A naive greedy implementation has O(Nd^2K) complexity, where N is the dataset size, d is the feature space dimension, and K is the number of generated comparisons. We show that, by exploiting the inherent geometry of the dataset namely, that it consists of pairwise comparison's the greedy algorithms complexity can be reduced to O(N^2(K + d) + N(dK + d^2) + d^2K). We apply the same acceleration also to the so-called lazy greedy algorithm. When combined, the above improvements lead to an execution time of less than 1 hour for a dataset with 10^8 comparisons; the naive greedy algorithm on the same dataset would require more than 10 days to terminate. Extending pairwise comparison to general ranking, we also propose a Bayesian inference model for ranking datasets, allowing us to take a probabilistic approach to ranking inference. Our probabilistic assumptions are motivated by, and consistent with, the so-called Plackett-Luce model. We propose a variational inference method to extract a closed-form Gaussian posterior distribution, and show experimentally that the closed form posterior distribution yields a more reliable ranking prediction over prediction via point estimates.

Subjects and keywords:

Active Learning
Experimental Design
Submodular Optimization
Variational Inference
Electrical engineering
Computer engineering

DOI:

https://doi.org/10.17760/D20327424

Permanent URL:

http://hdl.handle.net/2047/D20327424

Use and reproduction:

In Copyright: This Item is protected by copyright and/or related rights. You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the right-holder(s). (http://rightsstatements.org/vocab/InC/1.0/)
Copyright restrictions may apply.

Downloads

PDF