Optimized terasort algorithm for data analytics: case of climate data analysis

dc.contributor.authorMatu, Fiona Mugure
dc.date.accessioned2017-11-21T10:17:28Z
dc.date.available2017-11-21T10:17:28Z
dc.date.issued2017
dc.descriptionThesis submitted in partial fulfillment of the requirements for the Degree of Master of Science in Information Technology (MSIT) at Strathmore Universityen_US
dc.description.abstractWeather forecasting has proven valuable in unravelling the causes of the occurrence of natural phenomena and predicting of future climatic conditions. Subsequently, better preparation and policy making regarding these occurrences can be done using resultant information from techniques employed in weather forecasting. Analysis of vast amounts of data are characteristic of climatology hence require computing intensive techniques such as numerical weather prediction (NWP). This has made climate modelling a preserve of high performance computing (HPC) until the recent entrance of big data analytics. It is therefore necessary to optimize the algorithms used in the big data environment so as to give comparable performance to that offered by HPC environments. The study aimed at improving the big data MapReduce framework of analysis by optimizing the TeraSort benchmark algorithm. The algorithm proposed employed classical sort techniques and incorporated quantum computing mechanisms. Historical weather data collected at weather stations across the world was gathered and converted into organised, human readable format to suffice as input to the program. The proposed algorithm constituting of a map, sort and reduction phase transformed the bulky observational data into a compact summary of monthly temperature averages in linear complexity. This is a significant improvement in performance in comparison to the TeraSort algorithm on a single node. The study concludes by suggesting areas that may be explored for further optimization with emphasis on quantum computing capabilities.en_US
dc.identifier.urihttp://hdl.handle.net/11071/5634
dc.language.isoenen_US
dc.publisherStrathmore Universityen_US
dc.subjectClimate Modellingen_US
dc.subjectClassical Sorting Algorithmsen_US
dc.subjectQuantum Theoryen_US
dc.subjectSorting Algorithm Testingen_US
dc.subjectNational Climatic Data Centeren_US
dc.titleOptimized terasort algorithm for data analytics: case of climate data analysisen_US
dc.typeLearning Objecten_US
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Optimized terasort algorithm for data analytics case of climate data analysis.pdf
Size:
1.81 MB
Format:
Adobe Portable Document Format
Description:
Fulltext thesis
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: