Machine learning for multi-class identification of Gender Based Violence on social media
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
This study aims at showcasing the use of Machine Learning algorithms in the classification of forms of Gender Based Violence using Social Media data. Data mining processes were used to fetch 1 million tweets from January 2012- January 2023 from Twitter using keywords that identified Gender Based Violence. 160,000 tweets were manually labeled to identify the form of Gender Based Violence namely; physical violence, economic violence, sexual violence and emotional violence. The rest of the data was saved in SQLite as a GBV database. The tweets were filtered and analysed using Natural language Processing techniques such as Exploratory Data Analysis, Sentiment analysis and Topic Modelling. Machine learning algorithms such as Naïve bayes, Random Forest and Support Vector Machines were trained using the labelled data in order to predict the form of Gender based violence on the tweets. The models were evaluated using Accuracy, Precision, Recall, F1 score and AUC as the performance metrics. SVM using Glove features had the highest F1 score of 61% and an accuracy score (62%) followed by the Multinomial Logistic Regression at an F1 score of 60% and an accuracy of (61%). A web application was designed on streamlit to host the results of the study and allow users to interact and get the predicted form of GBV from text inputs or from data selected from the GBV database. Logistic Regression and SVM were found to show superiority in the detection of cyberbullying on twitter without the involvement of victims (Muneer,2020). In this study, the classification of GBV was intended to inform key stakeholders on the extent and form of GBV incidences and to aid in the identification or structuring of programs that can offer timely and relevant support to survivors of Gender Based Violence. The insights can be used to build social media-based interventions to support survivors immediately they are identified.
Key words: Gender Based Violence (GBV), social media, Machine Learning, Classification
Description
Full - text thesis
Keywords
Citation
Mutahi, E. W. (2025). Machine learning for multi-class identification of Gender Based Violence on social media [Strathmore University]. https://hdl.handle.net/11071/16526