Score every frame, keep only the good ones.
A Python pipeline that walks a video frame-by-frame, scores each one with a no-reference image-quality model, persists every score to SQLite, then exports the highest-quality and most representative stills with FFmpeg. CUDA-aware inference does the heavy lifting; a hardened export path makes sure the right frames actually land on disk, with traceable filenames.
A video has thousands of frames. You want the handful worth keeping.
A single clip can hold thousands — sometimes millions — of frames, but only a small subset is actually useful for thumbnails, dataset curation, archiving, or manual review. Scrubbing footage by hand is slow and inconsistent, and naive fixed-interval sampling either misses the best-looking frames or over-selects visually weak stretches.
I built this as a repeatable pipeline instead of a one-off script. It evaluates every decoded frame with a no-reference image-quality model, stores the raw scores, ranks frames by the metric's score direction, applies selection logic that avoids temporal clustering, and exports stills with enough metadata to trace each one back to its source video, frame number, timestamp, score, and rank. The hard constraint throughout: it has to run on real video files — external binaries, codec metadata, variable frame counts, GPU memory pressure, 8-bit and higher-bit-depth sources — not controlled toy inputs.
Decode, score, persist, select, export.
SQLite is the source of truth — every frame is inspectable after the run.
CUDA throughput, and an export path that refuses to lie.
Two boundaries where real systems break: GPU inference and FFmpeg on disk.
Making the GPU earn its keep
Scoring is the hot loop, so the inference side is built to stay fed. The assessor queries free VRAM and computes a dynamic batch size from frame resolution, dtype size, and a model-overhead factor — capped by the configured ceiling — so a 4K source and a 1080p source don't both get the same conservative batch. AMP lowers memory pressure on CUDA, and pinned host memory plus non-blocking transfers keep the copy from stalling the device.
Inside the loop, all scoring runs under torch.inference_mode() to drop autograd overhead, and batched outputs are flattened once with a single detach().reshape(-1).float().cpu() instead of per-frame .item() calls — repeated .item() forces a sync every frame and quietly destroys throughput. The loop deliberately avoids routine torch.cuda.synchronize(), reserving explicit syncs for benchmarking only. One subtle bug worth flagging: a pyiqa metric wrapper can report training = True even when the underlying network is in eval mode, so the assessor explicitly calls eval() and pins known score-direction overrides — both locked down with tests.
Hardening the export path
Media export is the most failure-prone boundary in the whole pipeline, so it gets the most defensive code. FFmpeg select expressions are batched by a configurable size because a single giant select=eq(n\,123)+eq(n\,456)+... can blow past parser and command-length limits on Windows before producing anything. After each batch the exporter validates that exactly the expected number of files came out; if not, it deletes the partial temp files and falls back to one-frame-at-a-time export so the frame-number-to-file mapping is never ambiguous. There's a disk-space preflight that estimates final output bytes, peak temporary bytes, and a safety margin before anything is written, and GPU decode falls back to CPU when a single-frame export fails with acceleration enabled. Frame numbering is 0-indexed end to end — decoder, SQLite, and FFmpeg's n variable — which was verified with MD5 checks between batch and single-frame exports to rule out off-by-one errors.