Survey automation - a machine learning approach for question classification and open-ended response analysis
| dc.contributor.author | Maina, G. K. | |
| dc.date.accessioned | 2026-04-19T15:21:58Z | |
| dc.date.issued | 2025 | |
| dc.description | Full - text thesis | |
| dc.description.abstract | Organizations increasingly rely on survey data to guide decision-making, yet manual analysis approaches—often involving spreadsheets and human coding—are time-intensive, error-prone, and struggle to scale, particularly for open-ended responses. This project developed an end-to-end automated solution to improve the accuracy and efficiency of survey analysis by integrating machine learning, natural language processing (NLP), and automated report generation. A labeled dataset of 5,785 survey questions (3,272 open ended and 2,513 closed-ended) was used to train and evaluate four classifiers based on accuracy: Logistic Regression (97.99% ), Naive Bayes (97.69%), Random Forest (97.90%), and Support Vector Machine (SVM), which achieved the best performance with a cross validation accuracy of 98.51% and a test accuracy of 98.36%. For topic modeling of open-ended responses, 100,000 entries from the IBM Employee Reviews dataset were analyzed using five techniques. BERTopic recorded the highest topic coherence score (0.6148), outperforming NMF (0.6028), LDA (0.5449), TF-IDF + KMeans (0.5426), and GloVe-based clustering (0.4731). The solution was deployed as a web-based application featuring modules for data preprocessing, classification, topic modeling, and visualization. Users can upload raw survey files and automatically receive a fully annotated PowerPoint report—complete with narrative insights generated using GPT-3.5. This automation not only improves the consistency and depth of analysis but also eliminates manual bottlenecks, making it highly scalable and practical for organizations handling frequent or large-scale surveys and enabling faster decision cycles. Keywords: Survey Automation, Machine Learning, NLP, Topic Modeling, BERTopic, SVM, TF-IDF, Report Generation, Open-Ended Responses | |
| dc.identifier.citation | Maina, G. K. (2025). Survey automation—A machine learning approach for question classification and open-ended response analysis [Strathmore University]. https://hdl.handle.net/11071/16393 | |
| dc.identifier.uri | https://hdl.handle.net/11071/16393 | |
| dc.language.iso | en_US | |
| dc.publisher | Strathmore University | |
| dc.title | Survey automation - a machine learning approach for question classification and open-ended response analysis | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Survey automation - a machine learning approach for question classification and open-ended response analysis.pdf
- Size:
- 1.11 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: