Cross-Lingual model for hate speech detection on Twitter: a case of Swahili and Swahili-English slang

Kariuki, A. O.

Cross-Lingual model for hate speech detection on Twitter: a case of Swahili and Swahili-English slang

Files

Cross-Lingual model for hate speech detection on Twitter - a case of Swahili and Swahili-English slang.pdf (2.94 MB)

Date

2023

Authors

Kariuki, A. O.

Publisher

Strathmore University

Abstract

The prevalence and entrenchment of online hate, hate crimes and hate speech in contemporary society concern organisations and governments. Detecting online hate, especially on social media, has proven daunting as offensive languages have multifaceted behaviours, and most training data are topic specific. On top of that, available solutions and research are geared towards the English language; thus, detecting online hate in lower-level languages like Swahili and Indigenous African Languages is much more difficult. This has worsened because social platforms such as Twitter, Facebook, Instagram, Rumble and YouTube enable consumers to converse and participate in their native dialects. This research proposed using cross-lingual transfer learning for hate detection to overcome these challenges. A Cross-Lingual model built on a BERT pre-trained model was developed as part of the research's experimental methodology, and its performance was compared to those of more established text classifiers like SVM, NB, and LR. Through the Twitter API, more than 300K tweets with a Kenyan focus were collected. These tweets focused on Kenya's most divisive moments in history, namely the 2013, 2017, and 2022 general elections. A set of predetermined criteria, including user location, tweet location, hashtags, pro-hate accounts, hate patterns, and racial epithets, were used to collect the data. For usage in the model development, training and validation, a random sample of over 20K tweets was annotated as hate or non-hate. The developed Cross-Lingual model achieved a ROC curve area under the curve of 0.77 and an accuracy of 77 per cent. The following are the contributions made by this study. Primarily, the research established an empirical framework and methodology for utilising transfer learning to identify the offensive language in low-resource languages. Additionally, this strategy was crucial in creating a text classification framework that could be broadly applied to different types of abusive language on online platforms. The model's results may thus be used to inform data-driven legislation regarding the detection of online hate as well as evidence-based decisions by pertinent intelligence agencies. Keywords: Deep learning, free speech, freedom of speech, hate detection, hate speech, machine learning, natural language processing, social media, Twitter.

Description

Full- text thesis

Citation

Kariuki, A. O. (2023). Cross-Lingual model for hate speech detection on Twitter: A case of Swahili and Swahili-English slang [Strathmore University]. http://hdl.handle.net/11071/13529

URI

http://hdl.handle.net/11071/13529

Collections

MSIT Theses and Dissertations (2023)

Full item page

Cross-Lingual model for hate speech detection on Twitter: a case of Swahili and Swahili-English slang

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By