Unlocking biomedical data for AI health research in Africa using GeneNetwork
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
Genetic data analysis is essential for understanding biological processes and diseases. GeneNetwork (GN), an open-source platform with over 20 years of genetic and phenotypic data, relies on a complex relational database. However, the data is currently difficult to access and manipulate due to its complex underlying structures, including around 80 cross-referenced Structured Query Language (SQL) tables and various file types. This dissertation aimed to address the limitations of the GeneNetwork2 SQL database in representing and querying graph-like biological data by transforming it into the Resource Description Framework (RDF). A self documenting Domain Specific Language (DSL) was developed using GNU Guile to automate the conversion of GN’s MariaDB SQL database into RDF triples. This involved defining ontologies, mapping SQL views to RDF, and storing the data in Virtuoso. The framework’s effectiveness was evaluated by comparing query performance and output quality between SQL and SPARQL. Results showed that RDF transformation significantly improved query efficiency and semantic richness. At a 99.9% confidence level, SPARQL queries exhibit statistically significant faster execution times than the equivalent SQL queries. Additionally, RDF’s structured representation enabled intuitive querying and better relationship discovery, as demonstrated in retrieving mouse species details and searching GeneRIF entries. In conclusion, transforming GN’s data into RDF made complex queries faster and enhanced its FAIR (Findable, Accessible, Interoperable, Reusable) properties, improving accessibility through semantic enrichment and interoperability with federated services for both human and machine agents. This transformation unlocks the full potential of the data, laying the groundwork for a more adaptable, AI-ready GN service and providing valuable insights for the broader application of RDF in biological and clinical data integration.
KEYWORDS: Artificial Intelligence, Data Accessibility, Data Interpretation, GeneNetwork, Biological Data, Data Discovery, Resource Description Framework (RDF), Metadata
Description
Full - text thesis