Cross-Lingual model for hate speech detection on Twitter: a case of Swahili and Swahili-English slang

dc.contributor.authorKariuki, A. O.
dc.date.accessioned2023-10-06T07:36:14Z
dc.date.available2023-10-06T07:36:14Z
dc.date.issued2023
dc.descriptionFull- text thesis
dc.description.abstractThe prevalence and entrenchment of online hate, hate crimes and hate speech in contemporary society concern organisations and governments. Detecting online hate, especially on social media, has proven daunting as offensive languages have multifaceted behaviours, and most training data are topic specific. On top of that, available solutions and research are geared towards the English language; thus, detecting online hate in lower-level languages like Swahili and Indigenous African Languages is much more difficult. This has worsened because social platforms such as Twitter, Facebook, Instagram, Rumble and YouTube enable consumers to converse and participate in their native dialects. This research proposed using cross-lingual transfer learning for hate detection to overcome these challenges. A Cross-Lingual model built on a BERT pre-trained model was developed as part of the research's experimental methodology, and its performance was compared to those of more established text classifiers like SVM, NB, and LR. Through the Twitter API, more than 300K tweets with a Kenyan focus were collected. These tweets focused on Kenya's most divisive moments in history, namely the 2013, 2017, and 2022 general elections. A set of predetermined criteria, including user location, tweet location, hashtags, pro-hate accounts, hate patterns, and racial epithets, were used to collect the data. For usage in the model development, training and validation, a random sample of over 20K tweets was annotated as hate or non-hate. The developed Cross-Lingual model achieved a ROC curve area under the curve of 0.77 and an accuracy of 77 per cent. The following are the contributions made by this study. Primarily, the research established an empirical framework and methodology for utilising transfer learning to identify the offensive language in low-resource languages. Additionally, this strategy was crucial in creating a text classification framework that could be broadly applied to different types of abusive language on online platforms. The model's results may thus be used to inform data-driven legislation regarding the detection of online hate as well as evidence-based decisions by pertinent intelligence agencies. Keywords: Deep learning, free speech, freedom of speech, hate detection, hate speech, machine learning, natural language processing, social media, Twitter.
dc.identifier.citationKariuki, A. O. (2023). Cross-Lingual model for hate speech detection on Twitter: A case of Swahili and Swahili-English slang [Strathmore University]. http://hdl.handle.net/11071/13529
dc.identifier.urihttp://hdl.handle.net/11071/13529
dc.language.isoen
dc.publisherStrathmore University
dc.titleCross-Lingual model for hate speech detection on Twitter: a case of Swahili and Swahili-English slang
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Cross-Lingual model for hate speech detection on Twitter - a case of Swahili and Swahili-English slang.pdf
Size:
2.94 MB
Format:
Adobe Portable Document Format
Description:
Full- text thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: