Twitter sentiment analysis tool for detecting crime hotspots: a case of Nairobi, Kenya

Onyango, Kevin Omondi
Journal Title
Journal ISSN
Volume Title
Strathmore University
Insecurity brought about by crime continues to be a major thorn in the flesh of citizens leaving in Kenya's urban centres. Although, incidents of crime are frequently reported from different regions in Kenya, there are more cases being reported from Kenya's urban centres than the rural ones, especially the informal settlement areas. This has been attributed to the rapid urbanisation of Kenya’s major towns. Violent crimes are costly. Murders, rapes, assaults, and robberies impose concrete economic costs on the victims who survive as well as the families of those who lose their lives, in the loss of earnings and their physical and emotional tolls. The Kenyan government has invested heavily in setting up Internet Protocol Circuit Television Cameras (IP CCTVs) in Nairobi's Central Business District in a bid to curb crime. In 2015, the government implemented a community policing strategy at various levels namely, market, estate, house level among others. This community policing is known as Nyumba Kumi. The culture in urban centres especially Nairobi makes it very difficult to implement Nyumaba Kumi. For instance, Nairobi is a city where people are less concerned with the affairs of their neighbours. For Nyumba Kumi to be effective in Nairobi, a culture change has to occur. Culture changes usually take time. On the other hand, CCTVs have proven to be a useful tool in tracking down criminals and bringing them to book. However, maintaining CCTVs is quite expensive and CCTV footages have been reported missing in some cases whenever investigators needed them. Data mining algorithms can be employed to fetch useful patterns on Social Media posts especially Tweets from Twitter to monitor crime. This study proposed a Twitter sentiment analysis tool which was used to detect crime hotspots in Nairobi. The tool employed machine learning techniques to a build binary classifier in detecting crime hotspots. This research fetched sample crime relevant tweets from Twitter which were used to build the corpora. Then a Support Vector Machine model was trained and validated based on the labelled text data using bigram features and term frequency-inverse document frequency weighting. In order to determine what combination of features provided the most desirable performance outcome on the data collected, the SVM model was compared to Naive Bayes, K-nearest neighbour and Random forest machine learning algorithms. Based on the results from the experiments, it was found that the best way to create a model for detecting crime hotspots using Twitter is the use of an SVM machine learning algorithm with bigram features weighted using tf-idf. The SVM model produced an accuracy of 88% making it the most accurate compared to the rest.
A Thesis Submitted to the School of Computing and Engineering Sciences in Partial Fulfilment for the Requirement of the Degree of Master of Science in Information Technology of Strathmore University
Insecurity, Crime, Twitter, Machine learning, Support vector machine, TF-IDF, Bigram