Examining Gaussian Mixture Models using clustering algorithms

dc.contributor.authorOloo, J. M.
dc.date.accessioned2024-03-14T09:30:52Z
dc.date.available2024-03-14T09:30:52Z
dc.date.issued2023
dc.descriptionFull - text thesis
dc.description.abstractClustering is an important data mining technique for finding homogeneous and heterogeneous groups in a data set. Identifying these groups from a sales data-set is important for estimating demand for a specific range of products. This research carried out a detailed analysis of Gaussian Mixture Models by using the expectation-maximization method to find optimal clusters on a sales data-set. The method combines expectation-maximization algorithm with the agglomerative hierarchical clustering, resulting in an effective, iterative process for estimating the model’s parameters. In order to give accurate estimates for the ideal number of clusters, the expectation-maximization approach uses the hierarchical clustering to provide an initial guess for the algorithm. The goal is to boost sales performance of products sold by estimating demand and comparing sales over a particular period. The method segmented clients into groups with shared characteristics, such that customers within each subgroup could be offered products and promotions that are likely to interest them. Therefore, this study was interested in maximizing the distance between individual clusters and also minimizing the distance between items belonging to the same cluster. The research experimented with sales data from a large liquor distribution company, examining how variables such as product, customer, sales region, and quantity sold affected overall sales volume and revenue. In order to identify deviation in product sales, the data-set was split into subsets. Also, before clustering and data pre-processing, exploratory data analysis was used to understand the features of the data. To correctly measure the performance of the clustering algorithm the study used the Bayesian Information Criterion as a goodness of fit metric. The results had two distinct clusters that represented analysis of 146 products and 223 customers from the dataset. These findings confirmed that Gaussian Mixture Models and EM algorithms are more effective at estimating the underlying key parameters and identifying subgroups of similar products and customers.
dc.identifier.citationOloo, J. M. (2023). Examining Gaussian Mixture Models using clustering algorithms [Strathmore University]. http://hdl.handle.net/11071/15389
dc.identifier.urihttp://hdl.handle.net/11071/15389
dc.language.isoen_US
dc.publisherStrathmore University
dc.titleExamining Gaussian Mixture Models using clustering algorithms
dc.typeThesis
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Examining Gaussian Mixture Models using clustering algorithms.pdf
Size:
1.03 MB
Format:
Adobe Portable Document Format
Description:
Full - text thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: