A HDF5 data compression model for IoT applications

Chabari, Risper Nkatha
Journal Title
Journal ISSN
Volume Title
Strathmore University
Internet of things has become an integral part of the modern digital ecosystem. According to current reports, more than 13.8 billion devices are connected as of 2021 and this massive adoption will surpass 30.9 billion devices by 2025. This means that IoT devices will become more prevalent and significant in our daily lives. Miniaturization in form factor chipsets and modules has contributed to cost-effective and faster running computer components. As a result of these technological advancements and mass adoption, the number of connected devices to the internet has been on the rise, leading to the generation of data, in high volumes, velocity, veracity, and variety. The major challenge is the data deluge experienced which in turn makes it challenging to visualize, store and analyse data generated in various formats. The adoption of relational databases like MySQL has been majorly used to store IoT data. However, it can only handle structured data because data is organized in tables with high consistency. On the other hand, NoSQL has also been adopted because of its capabilities of storing large volumes of data and has no reliance on a relational schema or any consistency requirements. This makes it suitable for only unstructured data. This outlines a clear need of adopting an effective way of storing and data managing IoT heterogeneous data in a compressed and self-describing format. Furthermore, there is no one- size all approach of managing heterogeneous data in IoT architecture. It is in the paradigm that this research solved this challenge by creating a tool that compresses heterogeneous data while saving it in a HDF5 format. The format of the data used was in .csv datasets. These data was parsed in the storage tool and data tool of the HDF5 for compression and conversion. The tool managed to achieve a good compression ratio percentage of 89.34% decrease from the original file. The output of the compressed file was represented on an external interactor called hdfview to validate that the algorithm used was lossless.
Submitted in partial fulfillment of the requirements for the degree of Master of Science in Information Technology at Strathmore University