Use of regular expressions for multilingual detection of Hate speech in Kenya
Maloba, Wilson
Journal Title
Journal ISSN
Volume Title
Strathmore University
language in online forums and other text based communication mediums such as
SMS is nothing new. It is not uncommon to see comments such as: ‘this comment has been removed due to a low rating’ or simply ‘comment removed’ on websites. Most sites employ a ‘report abuse’ button for users to flag comments they deem as abusive for one reason or another. So how does this happen, how are site administrators able to detect offensive texts? There are several methods used some of which include manual filtering and the use of mufti-level classifiers. However, the focus of this paper is on the use of regular expressions or regex in short. Regular expressions present a powerful method to detect string patterns in text. Hate speech has of late become a sensitive issue in Kenya given that it helped trigger the PEV of 2007/2008. However, the detection of this hate messages relies mostly on what is captured on the media or text that an online user happens to flag. This paper presents a method of using regular expressions, which are tried and tested, in the detection of hate speech in Kenya while taking into consideration three languages: English,Swahili and Sheng.
Thesis submitted to the Faculty of Information Technology in partial fulfillment of
the requirements for the award of a Master of Science Telecommunications Innovation and
Development of Strathmore University