Relevance of alternative data and machine learning in predicting default in a non- deposit taking SACCO in Kenya

Date
2022
Authors
Juma, Silas Okeyo
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Credit risk is the most important and difficult risk to manage in any financial institution. In Savings and Credit Co-operatives (SACCOs) particularly, credit risk is critical to the financial performance as it directly affects whether loans advanced will contribute to profits or losses. Traditional methods of credit scoring widely used like linear regression, discriminant analysis and judgement-based models have been proven to give mixed and unreliable results. This is majorly because they consider a small number of linear variables and experience of the credit officer which may also be subjective. The purpose of this research was to examine the relevance of non-traditional (alternative) data and Machine Learning (ML) algorithms in predicting default in a selected large SACCO in Kenya. Using micro-level secondary data of 783 loans extracted from the SACCO systems for a period of one-year (July 2018-June 2019), Logistic Regression (LR) and Extreme Gradient Boosting (XGBoost) algorithms were implemented through experimental research design. The results, after hyperparameter tuning of algorithms, reveal that when traditional and alternative data on borrower behavior are used, both LR and XGBoost showed greater improvement in default prediction than when traditional data was solely used. For Logistic Regression, the Area Under Curve, Accuracy, Precision and Recall improved by 5%, 12.1%, 12.9% and 1.43% respectively while in XGBoost, improvements of 15%, 2.41% and 2.41% for Area Under Curve, Accuracy and Recall were noted. Precision scores remained unaffected in this model. Overall, XGBoost showed superior performance than LR. Further, the predictors of default are spread across traditional as well as alternative features, with alternative features seemingly improving predictive power of the ML models. The novelty of this approach lies in the combination of data previously considered irrelevant and ML algorithms that aim to reduce dimensionality in the data and increase accuracy in predicting future behavior of borrowers. Unlike prior studies, this study employed a pragmatic approach to simulate practical appraisal procedures for SACCOs in Kenya using scarce micro-level default data. However, financial data availability, legal and regulatory limitations on private data usage were the major challenges. Future studies may also consider other forms of alternative data like analysis of social media activities, unemployment data, average household incomes, mortgage uptake data, inflation rate, consumer price index among others.
Description
A Theses submitted in partial fulfillment of the requirements for the Degree of Master of Commerce in forensic accounting at Strathmore University
Keywords
Citation