A Hybrid model for the detection of phishing URLs based on machine learning
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Phishing attacks have become increasingly prevalent and sophisticated, posing a significant threat to online security for both individuals and organizations. Traditional phishing detection methods, such as blacklists and heuristic-based systems, are often ineffective against modern phishing tactics due to their inability to adapt to the evolving nature of these attacks. Machine Learning (ML) can be applied to process signature databases of phishing URLs to enhance the effectiveness of blacklists. This research explores the potential of using ML to enhance phishing URL detection. It proposes a hybrid model that will utilise a combination of two or more of the following algorithms: Decision trees which work by creating a model that splits the dataset into branches based on feature values, such as URL length, presence of suspicious keywords, or domain age, each internal node represents a decision based on a feature, and each leaf node represents a classification as either Legitimate or phishing; XGBoost which implements an ensemble learning approach to classify URLs by combining multiple decision trees, it processes features like URL length and character composition and improves accuracy through gradient boosting; Convolutional Neural Networks which work by analysing visual representations of URLs and website content, by extracting features such as character composition from these visual inputs, CNNs can identify patterns indicative of phishing sites and Support Vector Machines which functions by finding the optimal hyperplane that separates phishing from legitimate sites based on extracted features. The developed solution will utilise the PhishTank database, which provides a community-driven repository of known phishing URLs, to analyse historical data and identify common patterns in previously flagged URLs. The expected outcome is a more precise and comprehensive blacklist of phishing URLs. This approach strengthens existing blacklists and facilitates proactive identification of emerging phishing threats based on evolving patterns. The solution also implements a feedback mechanism for model retraining, where misclassified URLs are identified and used to update the training dataset. The outcome will be improved detection accuracy and adaptability of the solution. The study will use a modified Agile development methodology with five stages: Planning, Design, Development, Testing, and Review/Feedback.
Description
Full - text thesis
Keywords
Citation
Oduor, C. O. (2025). A Hybrid model for the detection of phishing URLs based on machine learning [Strathmore University]. https://hdl.handle.net/11071/16416