A prototype for information capture using natural language processing : a case of Nation online news

Mbuthia, Samuel K.
Journal Title
Journal ISSN
Volume Title
Strathmore University
Natural Language Processing is a crucial component in the study of artificial intelligence. The Internet contains an abundance of information from a multitude of sources and a lot of this information is in human language which is not easily interpretable by computers. This is especially the case with unstructured data. Newswire sources, which include news websites and other forms of digital news reporting, provide easily accessible sources of information. This information is reliable because in many cases it is provided by journalists who are governed by professional codes of conduct. Years of reporting news on the Internet has resulted in a lot of information that is free and readily available. However, this information cannot be utilized effectively as it is too much to be processed by usual data mining techniques. This is where Natural Language Processing comes in as an essential tool in mining and processing the plethora of newswire content. This research took an applied approach to try and address the problem of inefficient utilization of newswire content. The research went through several stages each trying to address a specific area of the problem. While the problem being addressed was a global one, a focus on Kenyan news made the tasks manageable and specific but at the same time scalable to a larger setting. The research delved into the creation of a novel prototype that addressed the gaps in currently existing systems and improved on how information processed by Natural Language Processing techniques could be presented and reported. The application of graphical representations by using a number of advanced graphical libraries and application program interfaces as well as the application of machine learning models and algorithms were an integral part of the research. Various tests were undertaken to determine the viability of the prototype and key among them were performance test that monitored the performance of the applied machine learning algorithms. The results of the tests were duly tabulated and reported. While the research was largely successful, it was concluded that a lot more remains to be done to comprehensively address the gaps that exists in the area of research. As such, this research acts as a platform for further research that can gradually fill all the gaps identified as well as those that may be identified in the future.
Submitted in partial fulfillment of the requirements for the Degree of Master of Science in Computer-Based Information Systems
Prototype, Natural language, Nation online news, News, Information capture, artificial intelligence, Natural Language Processing