All Projects
Confidential context: Product buildAI / MLLive

ASL Recognizer

A focused software project built around a real workflow.

ASL Recognizer cover screenshot

How the project is put together

Architecture map

6 layers / 5 directed links

100%

feedsfeedsfeedsfeedsfeeds
  1. 01
    Interface

    Live webcam preview with recognized ASL letter predictions.

    OpenCV window / Webcam overlay
  2. 02
    Application

    Hand landmark extraction, normalized features, confidence controls, and prediction display.

    Python / Feature normalization / Threshold controls
  3. 03
    Services/API

    Hand landmark detector and lightweight classifier for 24 static letters.

    MediaPipe / TensorFlow
  4. 04
    Data

    Published landmark datasets and trained model artifacts support reproducibility.

    CSV landmarks / Kaggle / Hugging Face
  5. 05
    Auth/Permissions

    No application auth; datasets and release artifacts are public.

    Open research release
  6. 06
    Runtime

    Consumer webcam runtime with CPU-friendly model inference.

    Python / CPU inference / v1.0.0 release

From broken workflow to operating system

A workflow needed clearer structure and better software support.

A shipped system made the workflow easier to operate and maintain.

The workflow constraint

Existing ASL recognition tools are either research-only or require expensive hardware. A lightweight, real-time system was needed that runs on consumer hardware with just a webcam, using open-source ML frameworks.

What changed

Real-time ASL recognition for 24 static letters
Published dataset on Kaggle and Hugging Face
v1.0.0 release with live webcam inference
CPU-only inference with sub-100ms latency

Decisions and trade-offs

MediaPipe over custom CNN for hand detection

Need reliable hand landmark extraction that works across different skin tones and lighting.

Decision: Used Google's MediaPipe Hands for pre-built, production-quality hand landmark detection, then trained a lightweight classifier on extracted landmarks.

Trade-off: Depends on MediaPipe accuracy, but dramatically reduces training data needs and computation compared to end-to-end CNN approaches.

Landmark features over raw pixel classification

Raw image classification would require massive datasets and GPU training.

Decision: Extract 21 hand landmarks (63 features) and train a simple dense network, making the model lightweight and fast for real-time inference.

Trade-off: Loses some spatial information from raw images, but enables CPU-only real-time inference with high accuracy on static poses.

Constraints, architecture, and proof

Pipeline: OpenCV webcam capture → MediaPipe hand landmark detection (21 points × 3 coords) → feature normalization → TensorFlow dense classifier → real-time prediction overlay. Training pipeline: raw images → MediaPipe landmark extraction → CSV dataset → model training with validation split.

Must run in real-time on consumer hardware with just a webcam
MediaPipe hand landmark extraction must be fast enough for live inference
Model must handle varying lighting conditions and hand positions
J and Z are motion-based letters — excluded from static classifier scope

Deployment, security, and maintenance

Adjustable confidence threshold (via +/- keys) to tune precision vs recall. Published datasets on Kaggle and Hugging Face for reproducibility.

What I would improve next

Add temporal model for motion-based J and Z letters
Implement phrase-level recognition with word assembly
Add two-hand gesture support
Build web-based demo with TensorFlow.js