We describe the creation of HurtLex, a multilingual lexicon of hate words. The starting point is the Italian hate lexicon developed by the linguist Tullio De Mauro, organized in 17 categories. It has been expanded through the link to available synset-based computational lexical resources such as MultiWordNet and BabelNet, and evolved in a multi-lingual perspective by semiautomatic translation and expert annotation. A twofold evaluation of HurtLex as a resource for hate speech detection in social media is provided: a qualitative evaluation against an Italian annotated Twitter corpus of hate against immigrants, and an extrinsic evaluation in the context of the AMI@Ibereval2018 shared task, where the resource was exploited for extracting domain-specific lexicon-based features for the supervised classification of misogyny in English and Spanish tweets.

Hurtlex: A multilingual lexicon of words to hurt

Basile, Valerio;Patti, Viviana
2018-01-01

Abstract

We describe the creation of HurtLex, a multilingual lexicon of hate words. The starting point is the Italian hate lexicon developed by the linguist Tullio De Mauro, organized in 17 categories. It has been expanded through the link to available synset-based computational lexical resources such as MultiWordNet and BabelNet, and evolved in a multi-lingual perspective by semiautomatic translation and expert annotation. A twofold evaluation of HurtLex as a resource for hate speech detection in social media is provided: a qualitative evaluation against an Italian annotated Twitter corpus of hate against immigrants, and an extrinsic evaluation in the context of the AMI@Ibereval2018 shared task, where the resource was exploited for extracting domain-specific lexicon-based features for the supervised classification of misogyny in English and Spanish tweets.
2018
5th Italian Conference on Computational Linguistics, CLiC-it 2018
Turin, Italy
2018
CEUR Workshop Proceedings
CEUR-WS
2253
1
6
http://ceur-ws.org/Vol-2253/paper49.pdf
hate speech detection, hate lexicon, corpora, social media, Italian, Twitter
Bassignana, Elisa; Basile, Valerio; Patti, Viviana
File in questo prodotto:
File Dimensione Formato  
paper49-clic2018.pdf

Accesso aperto

Tipo di file: PDF EDITORIALE
Dimensione 278.34 kB
Formato Adobe PDF
278.34 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/2318/1684807
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 42
  • ???jsp.display-item.citation.isi??? ND
social impact