A Multimodal Vision-Language framework for pulmonary embolism detection: integrating 3D CT analysis with medical LLM report generation
| dc.contributor.author | Tonui, S. C. | |
| dc.date.accessioned | 2026-04-13T09:37:48Z | |
| dc.date.issued | 2025 | |
| dc.description | Full - text thesis | |
| dc.description.abstract | Pulmonary embolism (PE) is a serious and life-threatening condition caused by an artery blockage in the lung. The prevalence of PE varies significantly, ranging from 0.14% to as high as 61.5%, depending on the patient population and medical setting. Mortality rates for those who develop PE are also concerning, with between 40% and 69.5% of patients dying from the condition. Particularly alarming is the case-fatality rate following surgery, where approximately 60% of patients who develop PE do not survive. Computed tomography pulmonary angiography (CTPA) is the standard imaging method for PE detection. The scans generates hundreds of images which need to be manually reviewed, making it time consuming and prone to overdiagnosis. In Kenya, this issue is further exacerbated by a significant shortage of radiologists with only 1 radiologist for every 270,000 people, far below the recommended 10-12 radiologists per 100,000 people. This deficit delays diagnosis and significantly raises the risk of adverse outcomes, including higher mortality rates for patients. Artificial intelligence (AI) offers a promising solution by automating the image analysis process, reducing diagnostic delays, and supporting radiologists in decision-making. This study presents a deep learning-based pipeline for automated PE detection and radiology report generation. Our system integrates a CT-ViT (Vision Transformer for 3D Medical Image Processing) model to extract features from CTPA scans, followed by the MedItron-7B Large Language Model (LLM), which translates extracted insights into structured radiology reports. Additionally, a Visual Question Answering (VQA) module enhances clinical interpretability by enabling contextual queries on detected abnormalities. Evaluation metrics indicate strong model performance in structured text generation, with peak ROUGE-1 recall reaching 1.0, while BLEU-1 and BLEU-4 scores of 0.35 and 0.22, respectively, highlight challenges in maintaining linguistic coherence. The results indicate the potential of AI-driven diagnostic tools in improving PE detection efficiency, reducing radiologist workload, and enhancing diagnostic accuracy. KEY WORDS: Pulmonary Embolism, PE, CTPA, CTViT, Deep Learning, LLM, Radiology. | |
| dc.identifier.citation | Tonui, S. C. (2025). A Multimodal Vision-Language framework for pulmonary embolism detection: Integrating 3D CT analysis with medical LLM report generation [Strathmore University]. https://hdl.handle.net/11071/16377 | |
| dc.identifier.uri | https://hdl.handle.net/11071/16377 | |
| dc.language.iso | en | |
| dc.publisher | Strathmore University | |
| dc.title | A Multimodal Vision-Language framework for pulmonary embolism detection: integrating 3D CT analysis with medical LLM report generation | |
| dc.type | Thesis |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- A Multimodal Vision-Language framework for pulmonary embolism detection - integrating 3D CT analysis with medical LLM report generation.pdf
- Size:
- 15.31 MB
- Format:
- Adobe Portable Document Format
License bundle
1 - 1 of 1
Loading...
- Name:
- license.txt
- Size:
- 1.71 KB
- Format:
- Item-specific license agreed upon to submission
- Description: