A Machine learning model to predict non-revenue water with severely unbalanced classes
Every household, industry, institution, organization needs clean water for existence. In Kenya, water is used for human consumption, production, and agriculture. The consumption of water, therefore, contributes to the overall growth of the economy through water bills. The term non-revenue water (NRW) is defined as water produced and 'lost' before it reaches the customers. NRW is also described as the difference in volume reaching the final consumer for billing and the initial volume released into the distribution network. Based on the assessment of the Public-Private Infrastructure Advisory Facility (PPIF), an organization that fosters inter-agency cooperation to curbing NRW, physical losses are the main causes of NRW. As per PPIF, most NRW emanates from physical losses, including burst pipes that are often a result of poor maintenance. Besides physical losses, PPIF notes other numerous sources of NRW, especially commercial losses arising from the manner billing data is handled throughout the billing process. The main issues related to this cause include under-registration of customers' meters’ reading, data handling errors, theft, and illegal connections. Other causes of NRW include unbilled authorized consumption such as water used for firefighting, utilities for operational purposes, and water provided to specific groups for free. Therefore, non-revenue water risks the country's revenue collection, which can lead to slow economic growth. This research proposes development of a machine learning model that will be used by water service providers. The model will be able to assist the WSP companies to reduce non-revenue water by predicting water consumption of different customers. To achieve these objectives, we intend to focus on providing tools and methods that will guide the WSPs on reducing the non-revenue water. Our model was trained with 2 years consumption dataset of Nairobi County. The model developed was able to predict customer monthly consumption with percentage accuracy of 95%.
Submitted in partial fulfillment of the requirements for the degree of Master of Science in Information Technology at Strathmore University