Sentiment analysis model for detection of radicalization on twitter
Oluoch, Emmanuel Olang'
MetadataShow full item record
In the recent past, radical acts such as terrorist attacks on highly populated areas have become a major security issue in Kenya hence there is a constant fear of becoming a victim of violent acts perpetrated by individuals who are radicalized before carrying out such harmful acts. Current efforts by the security organs used to detect radicalization involve monitoring public communications channels, relying on information gathered from community policing, random suspect searches, and other intelligence services. However, these approaches have a set of drawbacks first being the manual human intervention needed in decision making even in cases where technology has been used to such as the use of web crawlers. In addition to this, automated text classification techniques rely on feature generation techniques that don't take into account the context of the text and that are also subjected to sparsity when classifying long texts. On the other hand, to grow membership numbers, radical groups use the public platforms offered by social media such as Twitter to disseminate radical ideologies and facilitate the recruitment of those who support such ideologies. In this study, I propose the use of sentiment analysis model to detect online radicalization. The model uses text classification using artificial neural networks to learn word relations within a corpus and generate corresponding features represented as lower-dimensional vectors. By using Continuous Bag of Words (CBOW) encoding and Word2Vec methods, tokens were represented as integers and an embedding of all corpus words was generated i.e trained to be used in one of the layers of the classifier’s neural network. Using the embeddings, labeled data instances, a four-layered recurrent neural network classifier was developed for text classification of radical and non-radical statements. The classifier then uses the pre-trained embeddings to refer corresponding vector values for tokens within the input padded sequence hence classify the new instance and return a rounded float value of 1 or 0 indicating a class. To train the model, I used pre-existing data of tweets collected from Twitter using keyword guides and also tested using data from the Kenyan setting. The model developed had an accuracy score of 95% after varying iterations of training, testing, and validation.