Use of regular expressions for multilingual detection of Hate speech in Kenya
language in online forums and other text based communication mediums such as SMS is nothing new. It is not uncommon to see comments such as: ‘this comment has been removed due to a low rating’ or simply ‘comment removed’ on websites. Most sites employ a ‘report abuse’ button for users to flag comments they deem as abusive for one reason or another. So how does this happen, how are site administrators able to detect offensive texts? There are several methods used some of which include manual filtering and the use of mufti-level classifiers. However, the focus of this paper is on the use of regular expressions or regex in short. Regular expressions present a powerful method to detect string patterns in text. Hate speech has of late become a sensitive issue in Kenya given that it helped trigger the PEV of 2007/2008. However, the detection of this hate messages relies mostly on what is captured on the media or text that an online user happens to flag. This paper presents a method of using regular expressions, which are tried and tested, in the detection of hate speech in Kenya while taking into consideration three languages: English,Swahili and Sheng.