Handspeak - Tariq Wadi

Project Overview

Handspeak is an innovative AI-powered application that uses computer vision and machine learning to recognize and translate American Sign Language (ASL) in real-time. Inspired by my experience in BCHACKS 2024, where another team used computer vision and led me to research it on my own, this project started as a school assignment in early 2025 and has evolved into a comprehensive personal project that recognizes all 26 ASL letters, including motion-based letters like J and Z.

Key Features

Real-time sign language recognition using webcam
Complete support for all 26 ASL letters including motion-based letters (J, Z)
Advanced hand landmark detection and tracking
Live translation to text and speech output
Computer vision-based gesture recognition
Cross-platform compatibility

Technical Implementation

The system leverages advanced computer vision techniques with OpenCV for hand detection and tracking, combined with machine learning models for gesture classification. The application processes video frames in real-time to detect hand gestures and classify them into corresponding ASL letters, with special handling for motion-based letters like J and Z.

Key technical components include:

MediaPipe for precise hand landmark detection
Custom CNN model trained on ASL letter datasets
Real-time video processing pipeline
Motion tracking for dynamic gestures (J, Z)
Text-to-speech integration
User interface built with Tkinter

Project Journey & Results

What started as a hackathon-inspired school project has grown into a comprehensive personal project that demonstrates advanced computer vision and machine learning capabilities. The system successfully recognizes all 26 ASL letters, including the challenging motion-based letters J and Z, showcasing the potential of AI in accessibility and inclusive technology.

The application has been tested with various lighting conditions and hand positions, showing robust performance across different environments. The project represents a significant technical achievement in real-time gesture recognition and serves as a foundation for future expansion into more complex ASL vocabulary and sentence-level recognition.