A Multimodal Vision-Language framework for pulmonary embolism detection: integrating 3D CT analysis with medical LLM report generation

Tonui, S. C.

A Multimodal Vision-Language framework for pulmonary embolism detection: integrating 3D CT analysis with medical LLM report generation

dc.contributor.author	Tonui, S. C.
dc.date.accessioned	2026-04-13T09:37:48Z
dc.date.issued	2025
dc.description	Full - text thesis
dc.description.abstract	Pulmonary embolism (PE) is a serious and life-threatening condition caused by an artery blockage in the lung. The prevalence of PE varies significantly, ranging from 0.14% to as high as 61.5%, depending on the patient population and medical setting. Mortality rates for those who develop PE are also concerning, with between 40% and 69.5% of patients dying from the condition. Particularly alarming is the case-fatality rate following surgery, where approximately 60% of patients who develop PE do not survive. Computed tomography pulmonary angiography (CTPA) is the standard imaging method for PE detection. The scans generates hundreds of images which need to be manually reviewed, making it time consuming and prone to overdiagnosis. In Kenya, this issue is further exacerbated by a significant shortage of radiologists with only 1 radiologist for every 270,000 people, far below the recommended 10-12 radiologists per 100,000 people. This deficit delays diagnosis and significantly raises the risk of adverse outcomes, including higher mortality rates for patients. Artificial intelligence (AI) offers a promising solution by automating the image analysis process, reducing diagnostic delays, and supporting radiologists in decision-making. This study presents a deep learning-based pipeline for automated PE detection and radiology report generation. Our system integrates a CT-ViT (Vision Transformer for 3D Medical Image Processing) model to extract features from CTPA scans, followed by the MedItron-7B Large Language Model (LLM), which translates extracted insights into structured radiology reports. Additionally, a Visual Question Answering (VQA) module enhances clinical interpretability by enabling contextual queries on detected abnormalities. Evaluation metrics indicate strong model performance in structured text generation, with peak ROUGE-1 recall reaching 1.0, while BLEU-1 and BLEU-4 scores of 0.35 and 0.22, respectively, highlight challenges in maintaining linguistic coherence. The results indicate the potential of AI-driven diagnostic tools in improving PE detection efficiency, reducing radiologist workload, and enhancing diagnostic accuracy. KEY WORDS: Pulmonary Embolism, PE, CTPA, CTViT, Deep Learning, LLM, Radiology.
dc.identifier.citation	Tonui, S. C. (2025). A Multimodal Vision-Language framework for pulmonary embolism detection: Integrating 3D CT analysis with medical LLM report generation [Strathmore University]. https://hdl.handle.net/11071/16377
dc.identifier.uri	https://hdl.handle.net/11071/16377
dc.language.iso	en
dc.publisher	Strathmore University
dc.title	A Multimodal Vision-Language framework for pulmonary embolism detection: integrating 3D CT analysis with medical LLM report generation
dc.type	Thesis

Files

Original bundle

Now showing 1 - 1 of 1

Name:: A Multimodal Vision-Language framework for pulmonary embolism detection - integrating 3D CT analysis with medical LLM report generation.pdf
Size:: 15.31 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

MSc. DSA Theses and Dissertations (2025)