|dc.description.abstract||Consumer credit risk scoring involves the assessment of the risk that is associated with a customer that applies for credit. The ability to confidently identify customers that will repay the credit and those that will not is therefore, an important aspect for any institution. The purpose of this study is to model consumer credit risk using machine learning models and compare the results to the traditional logistic model. The aim is to identify whether there is improved performance in the classification of default among customers when machine learning algorithms are used. Additionally, the study aims to identify how different customer characteristics affects their default experience. The data used was obtained from Kenya Metropol between 2014-2017 and had customer details such as age, loan amount, marital status and sex among others, during this period. 5 models are used to model the default experience namely: Logistic regression, Random Forest, Support Vector Machine, Gradient Boosting and Multi-layer Perceptron Neural Network. The efficiency of the models was assessed using the following metrics; Accuracy, Precision, Recall, F1-score and Precision-Recall curve. Due to the imbalanced nature of credit data set, the F1-score, which is a weighted average of the Precision and Recall, was eventually used as the metric to determine the best model for credit scoring. The findings showed that Random Forest performed the best, having an F1-score of 0.307.
The machine learning algorithms outperformed the logistic model and showed an improved performance in the classification of default, especially in identifying false positives. It was also established that male customers had a higher default probability, younger customers were more likely to default and single customers defaulted more than married customers||en_US