A Hybrid model for the detection of phishing URLs based on machine learning

dc.contributor.authorOduor, C. O.
dc.date.accessioned2026-04-21T09:47:56Z
dc.date.issued2025
dc.descriptionFull - text thesis
dc.description.abstractPhishing attacks have become increasingly prevalent and sophisticated, posing a significant threat to online security for both individuals and organizations. Traditional phishing detection methods, such as blacklists and heuristic-based systems, are often ineffective against modern phishing tactics due to their inability to adapt to the evolving nature of these attacks. Machine Learning (ML) can be applied to process signature databases of phishing URLs to enhance the effectiveness of blacklists. This research explores the potential of using ML to enhance phishing URL detection. It proposes a hybrid model that will utilise a combination of two or more of the following algorithms: Decision trees which work by creating a model that splits the dataset into branches based on feature values, such as URL length, presence of suspicious keywords, or domain age, each internal node represents a decision based on a feature, and each leaf node represents a classification as either Legitimate or phishing; XGBoost which implements an ensemble learning approach to classify URLs by combining multiple decision trees, it processes features like URL length and character composition and improves accuracy through gradient boosting; Convolutional Neural Networks which work by analysing visual representations of URLs and website content, by extracting features such as character composition from these visual inputs, CNNs can identify patterns indicative of phishing sites and Support Vector Machines which functions by finding the optimal hyperplane that separates phishing from legitimate sites based on extracted features. The developed solution will utilise the PhishTank database, which provides a community-driven repository of known phishing URLs, to analyse historical data and identify common patterns in previously flagged URLs. The expected outcome is a more precise and comprehensive blacklist of phishing URLs. This approach strengthens existing blacklists and facilitates proactive identification of emerging phishing threats based on evolving patterns. The solution also implements a feedback mechanism for model retraining, where misclassified URLs are identified and used to update the training dataset. The outcome will be improved detection accuracy and adaptability of the solution. The study will use a modified Agile development methodology with five stages: Planning, Design, Development, Testing, and Review/Feedback.
dc.identifier.citationOduor, C. O. (2025). A Hybrid model for the detection of phishing URLs based on machine learning [Strathmore University]. https://hdl.handle.net/11071/16416
dc.identifier.urihttps://hdl.handle.net/11071/16416
dc.language.isoen
dc.publisherStrathmore University
dc.titleA Hybrid model for the detection of phishing URLs based on machine learning
dc.typeThesis

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
A Hybrid model for the detection of phishing URLs based on machine learning.pdf
Size:
2.49 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: