Comparative sentiment analysis of techniques for cyberbullying detection on twitter
Kanam, Victor Otieno
MetadataShow full item record
Cyberbullying has become a common vice on the social media platforms and is quickly running out of hand. The psychological researches conducted on its effect are showing dire trends on the victims, sometimes leading to suicides among the victims. Currently, the efforts by the social media sites in curbing cyberbullying is largely user centered. Twitter platform provides a series of reactionary measures of dealing with cyberbullying instances, including; blocking users, reporting users, deleting posts and tagging tweets with warning labels. However, these approaches are more of reactionary than preventive. This leaves a gap in the software systems design which should eliminate the human intervention, by implementing technological methods in curbing cyberbullying. This research implemented the application of machine learning techniques to build a text classifier to detect instances of cyberbullying as the tweets are being composed. The research collected data from Twitter which was processed and labelled appropriately. A Support Vector Machine model was developed, trained and validated based on labelled text data using bigram features and term frequency-inverse document frequency weighting. An experimental approach was taken in determining what combination of features provided the most desirable performance outcome on the data collected. A comparative analysis was then done between the text classification algorithms (including Naïve Bayes, K-Nearest Neighbor and Random Forest Classifier) coupled the different features. The SVM classifier coupled with the bi-gram feature emerged as the best classifier while using sentiment to classify texts documents, with an accuracy of 84.22%.