Skip to main navigation Skip to search Skip to main content

Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks

  • Stefanie Amiruzzaman
  • , Md Amiruzzaman
  • , Raga Mouni Batchu
  • , James Dracup
  • , Alexander Pham
  • , Benjamin Crocker
  • , Linh Ngo
  • , M. Ali Akber Dewan
  • West Chester University
  • University of Delaware

Research output: Contribution to journalJournal Articlepeer-review

2 Citations (Scopus)

Abstract

This study presents a real-time, bidirectional system for translating American Sign Language (ASL) to and from English using computer vision and transformer-based models to enhance accessibility for deaf and hard of hearing users. Leveraging publicly available sign language and text–to-gloss datasets, the system integrates MediaPipe-based holistic landmark extraction with CNN- and transformer-based architectures to support translation across video, text, and speech modalities within a web-based interface. In the ASL-to-English direction, the sign-to-gloss model achieves a 25.17% word error rate (WER) on the RWTH-PHOENIX-Weather 2014T benchmark, which is competitive with recent continuous sign language recognition systems, and the gloss-level translation attains a ROUGE-L score of 79.89, indicating strong preservation of sign content and ordering. In the reverse English-to-ASL direction, the English-to-Gloss transformer trained on ASLG-PC12 achieves a ROUGE-L score of 96.00, demonstrating high-fidelity gloss sequence generation suitable for landmark-based ASL animation. These results highlight a favorable accuracy-efficiency trade-off achieved through compact model architectures and low-latency decoding, supporting practical real-time deployment.

Original languageEnglish
Article number20
JournalComputers
Volume15
Issue number1
DOIs
Publication statusPublished - Jan. 2026

Keywords

  • ASL translation
  • EfficientNet-B0 feature extraction
  • computer vision (MediaPipe)
  • gloss-to-sentence modeling
  • transformer models

Fingerprint

Dive into the research topics of 'Bidirectional Translation of ASL and English Using Machine Vision and CNN and Transformer Networks'. Together they form a unique fingerprint.

Cite this