Examining Gaussian Mixture Models using clustering algorithms

Oloo, J. M.

Examining Gaussian Mixture Models using clustering algorithms

Files

Examining Gaussian Mixture Models using clustering algorithms.pdf (1.03 MB)

Date

2023

Authors

Oloo, J. M.

Publisher

Strathmore University

Abstract

Clustering is an important data mining technique for finding homogeneous and heterogeneous groups in a data set. Identifying these groups from a sales data-set is important for estimating demand for a specific range of products. This research carried out a detailed analysis of Gaussian Mixture Models by using the expectation-maximization method to find optimal clusters on a sales data-set. The method combines expectation-maximization algorithm with the agglomerative hierarchical clustering, resulting in an effective, iterative process for estimating the model’s parameters. In order to give accurate estimates for the ideal number of clusters, the expectation-maximization approach uses the hierarchical clustering to provide an initial guess for the algorithm. The goal is to boost sales performance of products sold by estimating demand and comparing sales over a particular period. The method segmented clients into groups with shared characteristics, such that customers within each subgroup could be offered products and promotions that are likely to interest them. Therefore, this study was interested in maximizing the distance between individual clusters and also minimizing the distance between items belonging to the same cluster. The research experimented with sales data from a large liquor distribution company, examining how variables such as product, customer, sales region, and quantity sold affected overall sales volume and revenue. In order to identify deviation in product sales, the data-set was split into subsets. Also, before clustering and data pre-processing, exploratory data analysis was used to understand the features of the data. To correctly measure the performance of the clustering algorithm the study used the Bayesian Information Criterion as a goodness of fit metric. The results had two distinct clusters that represented analysis of 146 products and 223 customers from the dataset. These findings confirmed that Gaussian Mixture Models and EM algorithms are more effective at estimating the underlying key parameters and identifying subgroups of similar products and customers.

Description

Full - text thesis

Citation

Oloo, J. M. (2023). Examining Gaussian Mixture Models using clustering algorithms [Strathmore University]. http://hdl.handle.net/11071/15389

URI

http://hdl.handle.net/11071/15389

Collections

MSc.SS Theses and Dissertations (2023)

Full item page

Examining Gaussian Mixture Models using clustering algorithms

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By