STEP 01Upload & preprocess

Your video is uploaded to GPU inference. Frames are extracted and the lifter is detected with YOLO so we know exactly where in the frame to look on every frame.

STEP 022D pose estimation

ViTPose — a vision-transformer-based pose model — locates 17 anatomical keypoints (head, shoulders, elbows, wrists, hips, knees, ankles) per frame with sub-pixel accuracy. It's far more robust under occlusion and odd camera angles than the older OpenPose-style models.

STEP 033D reconstruction

VideoPose3D lifts the 2D keypoints into a true 3D skeleton sequence using a temporal convolutional network — it sees several frames of context, not just one. We then run EMA smoothing across frames so the joints don't jitter.

STEP 04Exercise-specific error scoring

For each lift (squat, bench, deadlift) we apply biomechanical thresholds: hip-crease vs knee for depth, frontal-plane knee tracking for valgus, pelvis tilt for butt wink, bar-path RMS for drift, etc. Each rep gets a score and a list of error tags.

STEP 05Report, routine & coach memory

You get a written rep-by-rep report, a scrubbable 3D replay, and a personalized warmup + cooldown — recurring errors from your last 60 days of analyses layer extra mobility drills onto the static baseline. The AI coach chat shares the same memory.

Why 3D matters

A single 2D camera angle hides depth and rotation. A 3D skeleton lets the detector reason about the joint positions in space — so it can tell the difference between a knee actually caving inward and a knee that just looks that way because of camera perspective. Same goes for bar path, butt wink and uneven lockout.

What runs where

Pose estimation runs on a GPU server — your phone doesn't have to. Reports usually come back in under 30 seconds. Uploaded video files are auto-deleted within 1 hour; only the numeric analysis is kept for your history.

Try it on your own lift

Squat form check · Bench press form check · Deadlift form check · Lifting form blog