Computational methods for bias reduction in surveys
Metadatos
Mostrar el registro completo del ítemAutor
Castro Martín, LuisEditorial
Universidad de Granada
Director
Rueda García, María Del MarDepartamento
Universidad de Granada. Programa de Doctorado en Estadística Matemática y AplicadaMateria
Bias reduction Surveys
Fecha
2022Fecha lectura
2022-07-01Referencia bibliográfica
Castro Martín, Luis. Computational methods for bias reduction in surveys. Granada: Universidad de Granada, 2022. [http://hdl.handle.net/10481/75960]
Patrocinador
Tesis Univ. Granada.Resumen
Probability sampling has been a fundamental framework over time in
order to carry out surveys from which reliable conclusions can be extracted
and properly justified. However, the application of its basic principles is now
being threatened by the surge of new technologies.
Online surveys are becoming a standard due to their ability to obtain big
data in a simple, cheap and efficient manner. In contrast, the methodologies
associated with these kinds of surveys are usually non-probabilistic. Often, a
link with the questionnaire is publicly shared, following a snowball sampling
design, implying the absence of representative design weights. This causes an
important self-selection bias. Even when there is a sampling frame available,
the reduced response rates associated with the lack of human interaction
produce an important non-response bias. Finally, coverage biases are also
common because part of the target population does not have access to some
of the required mediums, whether it is an internet connection, a smartphone
or some specific social network account.
Despite all these problems, their use is widely extended. Besides, the decrease
over the last years in the response rates of traditional surveys has affected
the viability of the alternatives. Therefore, great effort has been spent
on developing techniques which allow us to reduce bias in non-probability
surveys. The objective is proposing new methodologies in order to preserve
the credibility of statistical studies while also making use of the advantages
of new technologies.
The main proposals for this purpose are Propensity Score Adjustment,
which estimates the inclusion probabilities in order to obtain some representative
sample weights, and Statistical Matching, which is based on predicting
and imputing the individual’s responses. Both rely on an auxiliary probability
sample containing some covariates in common with our non-probability
sample, which includes the target variable of interest.
We contribute to the development of these techniques by proposing computational
methods which significantly improve their efficacy. First, we consider
their application with different advanced machine learning models, culminating
in state-of-the-art techniques which optimize the results obtained. We
also propose a novel method for combining Propensity Score Adjustment and Statistical Matching, improving the bias reduction obtained with each
method separately. We implement many of these methods along with other
bias reduction alternatives for non-probability surveys in NonProbEst, an
easy-to-use R package.
Additionally, we extend their application to more contexts. The Propensity
Score Adjustment method, combined with calibration techniques, can
be considered for overlapping panel surveys in order to obtain transversal as
well as longitudinal estimates over time. This compensates the bias resulting
from the non-response in successive measurements. In this way we propose
several reliable estimators which are then applied to diverse parameters
of interest in a research project about the evolution of COVID-19. We also
consider a scenario in which the auxiliary probabilistic sample includes the
target variable as well. An extensive comparative study is carried out with
different possible strategies. The results show the benefits of the proposed
methodologies.
Note: This thesis is presented as a compendium of six publications in
relation with the contents of the thesis. The full version of the papers is
included in Appendices A1 - A6.