Multimodal AI for clothing assistive solutions for the visually impaired
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Strathmore University
Abstract
This study presents an Artificial Intelligence (AI) powered Image-to-Text-to-Speech (ITTS) system to enhance accessibility for visually impaired individuals in the clothing domain. Using the DeepFashion2 dataset, the Bootstrapped Language Image Pretraining (BLIP) model generated enriched captions, integrating metadata such as clothing scale, viewpoint, and category. These enriched captions were synthesized into audio using Google Text-to-Speech (gTTS), offering an accessible and descriptive experience. The system’s performance was evaluated under zero-shot and fine-tuned settings, demonstrating substantial improvements in Bilingual Evaluation Understudy (BLEU)-1 (from 0.09 to 0.19), BLEU-2 (from 0.04 to 0.07), BLEU-3 (from 0.02 to 0.04), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) remained stable at 0.16. At the same time, Metric for Evaluation of Translation with Explicit Ordering (METEOR) improved from 0.09 to 0.13. Although Consensus-based Image Description Evaluation (CIDEr) scores remained at 0.0, the fine-tuned model excelled in generating contextually rich and descriptive captions due to metadata integration. This study highlights the potential of multimodal AI systems, whose performance was evaluated using BLEU and other standard metrics, to address accessibility challenges, providing a solution to empower visually impaired users and laying the groundwork for future innovations in inclusive design.
Keywords: Multimodal AI, Assistive Reading, Digital Accessibility, Fashion Content, Image-to-Speech, Inclusive Design.
Description
Full - text thesis
Keywords
Citation
Kathure, B. M. (2025). Multimodal AI for clothing assistive solutions for the visually impaired [Strathmore University]. https://hdl.handle.net/11071/16378