ASL Recognizer
A focused software project built around a real workflow.

How the project is put together
6 layers / 5 directed links
100%
- 01Interface
Live webcam preview with recognized ASL letter predictions.
OpenCV window / Webcam overlay - 02Application
Hand landmark extraction, normalized features, confidence controls, and prediction display.
Python / Feature normalization / Threshold controls - 03Services/API
Hand landmark detector and lightweight classifier for 24 static letters.
MediaPipe / TensorFlow - 04Data
Published landmark datasets and trained model artifacts support reproducibility.
CSV landmarks / Kaggle / Hugging Face - 05Auth/Permissions
No application auth; datasets and release artifacts are public.
Open research release - 06Runtime
Consumer webcam runtime with CPU-friendly model inference.
Python / CPU inference / v1.0.0 release
From broken workflow to operating system
A workflow needed clearer structure and better software support.
A shipped system made the workflow easier to operate and maintain.
The workflow constraint
Existing ASL recognition tools are either research-only or require expensive hardware. A lightweight, real-time system was needed that runs on consumer hardware with just a webcam, using open-source ML frameworks.
What changed
Decisions and trade-offs
MediaPipe over custom CNN for hand detection
Need reliable hand landmark extraction that works across different skin tones and lighting.
Decision: Used Google's MediaPipe Hands for pre-built, production-quality hand landmark detection, then trained a lightweight classifier on extracted landmarks.
Trade-off: Depends on MediaPipe accuracy, but dramatically reduces training data needs and computation compared to end-to-end CNN approaches.
Landmark features over raw pixel classification
Raw image classification would require massive datasets and GPU training.
Decision: Extract 21 hand landmarks (63 features) and train a simple dense network, making the model lightweight and fast for real-time inference.
Trade-off: Loses some spatial information from raw images, but enables CPU-only real-time inference with high accuracy on static poses.
Constraints, architecture, and proof
Pipeline: OpenCV webcam capture → MediaPipe hand landmark detection (21 points × 3 coords) → feature normalization → TensorFlow dense classifier → real-time prediction overlay. Training pipeline: raw images → MediaPipe landmark extraction → CSV dataset → model training with validation split.
Deployment, security, and maintenance
Adjustable confidence threshold (via +/- keys) to tune precision vs recall. Published datasets on Kaggle and Hugging Face for reproducibility.