Improving performance of hurdle models using Rare-Event Weighted Logistic Regression: application to maternal mortality data
Date
2022
Authors
Okello, Sharon Awuor
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Hurdle models, which are commonly used alongside zero-inflated models to analyze dispersed zero-inflated count data, employ a logit link function to predict whether an observation takes a positive count or a zero count based on a set of covariates. However, the logit model tends to be biased toward the majority zero class in cases involving rare events, and may underestimate the positive counts when their proportion is significantly smaller than that of the zero counts. This research aimed to improve the performance of hurdle models by incorporating rare-event weighted logistic regression model. Poisson and Negative Binomial (NB) Hurdle Rare Event Weighted Logistic Regression (REWLR) model estimates were developed and fit on various simulation conditions and maternal mortality data for performance evaluation using Akaike Information Criterion (AIC) and Area Under Curve (AUC). The Negative Binomial Hurdle REWLR emerged to be the best performing among all the evaluated models due to the ability to handle dispersion and adjust for class imbalance. The research findings will provide reliable estimates of the maternal mortality ratio in Nairobi without the risk of over-fitting zero counts.
Description
Submitted in partial fulfilment of the requirements for the degree of Master of Science in Statistical Sciences of Strathmore University