A Hybrid model for the detection of phishing URLs based on machine learning

Oduor, C. O.

A Hybrid model for the detection of phishing URLs based on machine learning

Files

A Hybrid model for the detection of phishing URLs based on machine learning.pdf (2.49 MB)

Date

2025

Authors

Oduor, C. O.

Publisher

Strathmore University

Abstract

Phishing attacks have become increasingly prevalent and sophisticated, posing a significant threat to online security for both individuals and organizations. Traditional phishing detection methods, such as blacklists and heuristic-based systems, are often ineffective against modern phishing tactics due to their inability to adapt to the evolving nature of these attacks. Machine Learning (ML) can be applied to process signature databases of phishing URLs to enhance the effectiveness of blacklists. This research explores the potential of using ML to enhance phishing URL detection. It proposes a hybrid model that will utilise a combination of two or more of the following algorithms: Decision trees which work by creating a model that splits the dataset into branches based on feature values, such as URL length, presence of suspicious keywords, or domain age, each internal node represents a decision based on a feature, and each leaf node represents a classification as either Legitimate or phishing; XGBoost which implements an ensemble learning approach to classify URLs by combining multiple decision trees, it processes features like URL length and character composition and improves accuracy through gradient boosting; Convolutional Neural Networks which work by analysing visual representations of URLs and website content, by extracting features such as character composition from these visual inputs, CNNs can identify patterns indicative of phishing sites and Support Vector Machines which functions by finding the optimal hyperplane that separates phishing from legitimate sites based on extracted features. The developed solution will utilise the PhishTank database, which provides a community-driven repository of known phishing URLs, to analyse historical data and identify common patterns in previously flagged URLs. The expected outcome is a more precise and comprehensive blacklist of phishing URLs. This approach strengthens existing blacklists and facilitates proactive identification of emerging phishing threats based on evolving patterns. The solution also implements a feedback mechanism for model retraining, where misclassified URLs are identified and used to update the training dataset. The outcome will be improved detection accuracy and adaptability of the solution. The study will use a modified Agile development methodology with five stages: Planning, Design, Development, Testing, and Review/Feedback.

Description

Full - text thesis

Citation

Oduor, C. O. (2025). A Hybrid model for the detection of phishing URLs based on machine learning [Strathmore University]. https://hdl.handle.net/11071/16416

URI

https://hdl.handle.net/11071/16416

Collections

MSIT Theses and Dissertations (2025)

Full item page

A Hybrid model for the detection of phishing URLs based on machine learning

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By