Files in this item

 Download all files in item (5.19 MB)
This item is
Publicly Available
and licensed under:
Creative Commons - Attribution-ShareAlike 4.0 International (CC BY-SA 4.0)
Distributed under Creative Commons Attribution Required Share Alike
Icon
Name
IMSyPP_SI_anotacije_training-clarin.csv
Size
4.27 MB
Format
CSV file
Description
Training dataset: Slovenian Twitter sample labeled for hate speech type and target.
MD5
0fe160a1e9ab82a723ba9b2ffee90f78
 Download file
Icon
Name
IMSyPP_SI_anotacije_evaluation-report.txt
Size
27.86 KB
Format
Text file
Description
Evaluation dataset: annotation agreement scores for the evaluation dataset.
MD5
a84c9b7a6c5cf99d0402160ec0d70273
 Download file  Preview
 File Preview  
Agreement report for both annotation questions: hate speech type (vrsta) and target (tarča) for the data in the file IMSyPP_SI_anotacije_evaluation-clarin.csv.

Hate speech types (vrsta):
0 appropriate (ni sporni govor)
1 inappropriate (nespodobni govor)
2 offensive (žalitev)
3 violent (nasilje)

Hate speech targets (tarča):
1 racism (ksenofobija in rasizem)
2 migrants (begunci/migranti)
3 islamophobia (islamofobija)
4 antisemitism (antisemitizem)
5 religion (druge religije)
6 homophobia (homofobija)
7 sexism (seksizem)
8 ideology (ideologija)
9 media (novinarji in mediji)
10 politics (politika/-i)
11 individual (posameznik)
12 other (drugo)

Annotated instances

a0 2000/2000
a1 2000/2000
a2 2000/2000
a3 2000/2000
a4 2000/2000
a5 2000/2000
a6 2000/2000
a7 2000/2000
a8 2000/2000
a9 2000/2000

-----------------
-----OVERALL-----
-----------------
Annotated for  vrsta : 20000
0 ni sporni govor     13273
1 nespodobni govor      285
2 žalitev . . .
                                            
Icon
Name
IMSyPP_SI_anotacije_evaluation-clarin.csv
Size
883.06 KB
Format
CSV file
Description
Evaluation dataset: Slovenian Twitter random sample labeled for hate speech type and target.
MD5
d1a0daa22905e4f1b582cd324b4c0074
 Download file
Icon
Name
IMSyPP_SI_anotacije_training-report.txt
Size
27.49 KB
Format
Text file
Description
Training dataset: annotation agreement scores for the training dataset.
MD5
756517ba847f27ea5decae1f1bc7f46c
 Download file  Preview
 File Preview  
Agreement report for both annotation questions: hate speech type (vrsta) and target (tarča) for the data in the file IMSyPP_SI_anotacije_training-clarin.csv.

Hate speech types (vrsta):
0 appropriate (ni sporni govor)
1 inappropriate (nespodobni govor)
2 offensive (žalitev)
3 violent (nasilje)

Hate speech targets (tarča):
1 racism (ksenofobija in rasizem)
2 migrants (begunci/migranti)
3 islamophobia (islamofobija)
4 antisemitism (antisemitizem)
5 religion (druge religije)
6 homophobia (homofobija)
7 sexism (seksizem)
8 ideology (ideologija)
9 media (novinarji in mediji)
10 politics (politika/-i)
11 individual (posameznik)
12 other (drugo)

Annotated instances

a0   9997 / 10000
a1   9950 / 10000
a2   9929 / 10000
a3   9992 / 10000
a4   10000 / 10000
a5   10000 / 10000
a6   9998 / 10000
a7   9979 / 10000
a8   9973 / 10000
a9   9991 / 10000


-----------------
-----OVERALL-----
-----------------
Annotated for  vrsta : 99809
vrsta
0 ni sporni govo . . .