Examining Gaussian Mixture Models using clustering algorithms

Oloo, J. M.

Examining Gaussian Mixture Models using clustering algorithms

dc.contributor.author	Oloo, J. M.
dc.date.accessioned	2024-03-14T09:30:52Z
dc.date.available	2024-03-14T09:30:52Z
dc.date.issued	2023
dc.description	Full - text thesis
dc.description.abstract	Clustering is an important data mining technique for finding homogeneous and heterogeneous groups in a data set. Identifying these groups from a sales data-set is important for estimating demand for a specific range of products. This research carried out a detailed analysis of Gaussian Mixture Models by using the expectation-maximization method to find optimal clusters on a sales data-set. The method combines expectation-maximization algorithm with the agglomerative hierarchical clustering, resulting in an effective, iterative process for estimating the model’s parameters. In order to give accurate estimates for the ideal number of clusters, the expectation-maximization approach uses the hierarchical clustering to provide an initial guess for the algorithm. The goal is to boost sales performance of products sold by estimating demand and comparing sales over a particular period. The method segmented clients into groups with shared characteristics, such that customers within each subgroup could be offered products and promotions that are likely to interest them. Therefore, this study was interested in maximizing the distance between individual clusters and also minimizing the distance between items belonging to the same cluster. The research experimented with sales data from a large liquor distribution company, examining how variables such as product, customer, sales region, and quantity sold affected overall sales volume and revenue. In order to identify deviation in product sales, the data-set was split into subsets. Also, before clustering and data pre-processing, exploratory data analysis was used to understand the features of the data. To correctly measure the performance of the clustering algorithm the study used the Bayesian Information Criterion as a goodness of fit metric. The results had two distinct clusters that represented analysis of 146 products and 223 customers from the dataset. These findings confirmed that Gaussian Mixture Models and EM algorithms are more effective at estimating the underlying key parameters and identifying subgroups of similar products and customers.
dc.identifier.citation	Oloo, J. M. (2023). Examining Gaussian Mixture Models using clustering algorithms [Strathmore University]. http://hdl.handle.net/11071/15389
dc.identifier.uri	http://hdl.handle.net/11071/15389
dc.language.iso	en_US
dc.publisher	Strathmore University
dc.title	Examining Gaussian Mixture Models using clustering algorithms
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Examining Gaussian Mixture Models using clustering algorithms.pdf
Size:: 1.03 MB
Format:: Adobe Portable Document Format
Description:: Full - text thesis

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

MSc.SS Theses and Dissertations (2023)