Show simple item record

dc.contributor.authorMugambi, Sharon Kaari
dc.date.accessioned2017-11-22T10:01:46Z
dc.date.available2017-11-22T10:01:46Z
dc.date.issued2017
dc.identifier.urihttp://hdl.handle.net/11071/5657
dc.descriptionThesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore Universityen_US
dc.description.abstractHate speech on social media has unfortunately become a common occurrence in the Kenyan online community largely due to advances in mobile computing and the internet. Incidents of hate speech on social media have the potential of quickly disseminating amidst online users and escalating into acts of violence and hate crimes due to incitement, as was the case during the 2007-2008 Post Election Violence. With the upcoming, highly contested 2017 general elections, the monitoring of hate speech on social media platforms is of critical importance to detect hate speech occurrences as soon as possible to prevent any further escalations which may result in violence. Current efforts by the National Cohesion and Integration Commission to monitor hate speech on social media involve the use of web crawlers to collect possible instances of hate speech based on specific keywords. Human monitors then have to analyze the collected data to determine instances that are actually hate speech. This human analysis is not only time consuming and overwhelming but also introduces subjective notions of what constitutes hate speech. This research proposed the application of machine learning techniques to build a text binary classifier to detect hate speech on twitter. Hate speech data was collected and labelled to build the corpora. A Support Vector Machine model was trained and validated based on the labelled text data using unigram features and term frequency-inverse document frequency weighting. The research employed an experimental approach to determine which combination of features, weighting schemes and classifiers gives the best performance on the collected hate speech data. Bigram features weighted using term frequency-inverse document frequency fed into a Support Vector Machine classifier gave the best classification performance at an accuracy of 76.22 percent, with an area under the curve of 0.76 for a Receiver Operating Characteristic curve.en_US
dc.language.isoenen_US
dc.publisherStrathmore Universityen_US
dc.subjectHate Speech -- Social Mediaen_US
dc.subjectMachine Learningen_US
dc.subjectSupport Vector Machineen_US
dc.subjectTF-IDFen_US
dc.subjectBigramen_US
dc.titleSentiment analysis for hate speech detection on social media: TF-IDF weighted N-Grams based approachen_US
dc.typeThesisen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record