How to detect online hate towards migrants and refugees? Developing and evaluating a classifier of racist and xenophobic hate speech using shallow and deep learning

Arcila Calderón, Carlos; Jiménez Amores, Francisco Javier; Sánchez Holgado, Patricia; Vrysis, Lazaros; Vryzas, Nikolaos; Oller Alonso, Martín

doi:10.3390/su142013094

Título

How to detect online hate towards migrants and refugees? Developing and evaluating a classifier of racist and xenophobic hate speech using shallow and deep learning

Autor(es)

Arcila Calderón, Carlos

Jiménez Amores, Francisco Javier

Sánchez Holgado, Patricia

Vrysis, Lazaros

Vryzas, Nikolaos

Oller Alonso, Martín

Palabras clave

Deep learning

Social media

Migration

Hate speech

Racism

Xenophobia

Clasificación UNESCO

63 Sociología

6308 Comunicaciones Sociales

Fecha de publicación

2022-10-13

Citación

Arcila-Calderón C, Amores JJ, Sánchez-Holgado P, Vrysis L, Vryzas N, Oller Alonso M. (2022) How to Detect Online Hate towards Migrants and Refugees? Developing and Evaluating a Classifier of Racist and Xenophobic Hate Speech Using Shallow and Deep Learning. Sustainability. 14(20):13094. https://doi.org/10.3390/su142013094

Resumen

[EN] Hate speech spreading online is a matter of growing concern since social media allows for its rapid, uncontrolled, and massive dissemination. For this reason, several researchers are already working on the development of prototypes that allow for the detection of cyberhate automatically and on a large scale. However, most of them are developed to detect hate only in English, and very few focus specifically on racism and xenophobia, the category of discrimination in which the most hate crimes are recorded each year. In addition, ad hoc datasets manually generated by several trained coders are rarely used in the development of these prototypes since almost all researchers use already available datasets. The objective of this research is to overcome the limitations of those previous works by developing and evaluating classification models capable of detecting racist and/or xenophobic hate speech being spread online, first in Spanish, and later in Greek and Italian. In the development of these prototypes, three differentiated machine learning strategies are tested. First, various traditional shallow learning algorithms are used. Second, deep learning is used, specifically, an ad hoc developed RNN model. Finally, a BERT-based model is developed in which transformers and neural networks are used. The results confirm that deep learning strategies perform better in detecting anti-immigration hate speech online. It is for this reason that the deep architectures were the ones finally improved and tested for hate speech detection in Greek and Italian and in multisource. The results of this study represent an advance in the scientific literature in this field of research, since up to now, no online anti-immigration hate detectors had been tested in these languages and using this type of deep architecture.

URI

http://hdl.handle.net/10366/160894

DOI

10.3390/su142013094

Versión del editor

https://www.mdpi.com/2071-1050/14/20/13094

Aparece en las colecciones