Multimodal AI for clothing assistive solutions for the visually impaired

Loading...
Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Strathmore University

Abstract

This study presents an Artificial Intelligence (AI) powered Image-to-Text-to-Speech (ITTS) system to enhance accessibility for visually impaired individuals in the clothing domain. Using the DeepFashion2 dataset, the Bootstrapped Language Image Pretraining (BLIP) model generated enriched captions, integrating metadata such as clothing scale, viewpoint, and category. These enriched captions were synthesized into audio using Google Text-to-Speech (gTTS), offering an accessible and descriptive experience. The system’s performance was evaluated under zero-shot and fine-tuned settings, demonstrating substantial improvements in Bilingual Evaluation Understudy (BLEU)-1 (from 0.09 to 0.19), BLEU-2 (from 0.04 to 0.07), BLEU-3 (from 0.02 to 0.04), Recall-Oriented Understudy for Gisting Evaluation (ROUGE-L) remained stable at 0.16. At the same time, Metric for Evaluation of Translation with Explicit Ordering (METEOR) improved from 0.09 to 0.13. Although Consensus-based Image Description Evaluation (CIDEr) scores remained at 0.0, the fine-tuned model excelled in generating contextually rich and descriptive captions due to metadata integration. This study highlights the potential of multimodal AI systems, whose performance was evaluated using BLEU and other standard metrics, to address accessibility challenges, providing a solution to empower visually impaired users and laying the groundwork for future innovations in inclusive design. Keywords: Multimodal AI, Assistive Reading, Digital Accessibility, Fashion Content, Image-to-Speech, Inclusive Design.

Description

Full - text thesis

Keywords

Citation

Kathure, B. M. (2025). Multimodal AI for clothing assistive solutions for the visually impaired [Strathmore University]. https://hdl.handle.net/11071/16378

Endorsement

Review

Supplemented By

Referenced By