Survey automation - a machine learning approach for question classification and open-ended response analysis
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Organizations increasingly rely on survey data to guide decision-making, yet manual analysis approaches—often involving spreadsheets and human coding—are time-intensive, error-prone, and struggle to scale, particularly for open-ended responses. This project developed an end-to-end automated solution to improve the accuracy and efficiency of survey analysis by integrating machine learning, natural language processing (NLP), and automated report generation. A labeled dataset of 5,785 survey questions (3,272 open ended and 2,513 closed-ended) was used to train and evaluate four classifiers based on accuracy: Logistic Regression (97.99% ), Naive Bayes (97.69%), Random Forest (97.90%), and Support Vector Machine (SVM), which achieved the best performance with a cross validation accuracy of 98.51% and a test accuracy of 98.36%. For topic modeling of open-ended responses, 100,000 entries from the IBM Employee Reviews dataset were analyzed using five techniques. BERTopic recorded the highest topic coherence score (0.6148), outperforming NMF (0.6028), LDA (0.5449), TF-IDF + KMeans (0.5426), and GloVe-based clustering (0.4731). The solution was deployed as a web-based application featuring modules for data preprocessing, classification, topic modeling, and visualization. Users can upload raw survey files and automatically receive a fully annotated PowerPoint report—complete with narrative insights generated using GPT-3.5. This automation not only improves the consistency and depth of analysis but also eliminates manual bottlenecks, making it highly scalable and practical for organizations handling frequent or large-scale surveys and enabling faster decision cycles.
Keywords: Survey Automation, Machine Learning, NLP, Topic Modeling, BERTopic, SVM, TF-IDF, Report Generation, Open-Ended Responses
Description
Full - text thesis
Keywords
Citation
Maina, G. K. (2025). Survey automation—A machine learning approach for question classification and open-ended response analysis [Strathmore University]. https://hdl.handle.net/11071/16393