Score every frame, keep only the good ones.

A Python pipeline that walks a video frame-by-frame, scores each one with a no-reference image-quality model, persists every score to SQLite, then exports the highest-quality and most representative stills with FFmpeg. CUDA-aware inference does the heavy lifting; a hardened export path makes sure the right frames actually land on disk, with traceable filenames.

Python

PyTorch

SQLite

CUDAFFmpegpyiqa

score per frame (NR-IQA)

CUDA

AMP + pinned memory

SQLite

scores persisted

best

stills auto-exported

video-analysis

// The problem

A video has thousands of frames. You want the handful worth keeping.

A single clip can hold thousands — sometimes millions — of frames, but only a small subset is actually useful for thumbnails, dataset curation, archiving, or manual review. Scrubbing footage by hand is slow and inconsistent, and naive fixed-interval sampling either misses the best-looking frames or over-selects visually weak stretches.

I built this as a repeatable pipeline instead of a one-off script. It evaluates every decoded frame with a no-reference image-quality model, stores the raw scores, ranks frames by the metric's score direction, applies selection logic that avoids temporal clustering, and exports stills with enough metadata to trace each one back to its source video, frame number, timestamp, score, and rank. The hard constraint throughout: it has to run on real video files — external binaries, codec metadata, variable frame counts, GPU memory pressure, 8-bit and higher-bit-depth sources — not controlled toy inputs.

// How it works

Decode, score, persist, select, export.

SQLite is the source of truth — every frame is inspectable after the run.

01 DECODE

PyAV/FFmpeg is the preferred decode path, with OpenCV as fallback. Frames come out as RGB tensors in batches — rgb24 for 8-bit sources, rgb48le for higher bit depths — each carrying frame number, timestamp, and dimensions.

02 PREP

Batches are permuted to NCHW, optionally pinned before a non-blocking GPU transfer, then cast to float, normalized to [0, 1], and clamped before scoring. For metrics with their own internal resize, project-side pre-resize is disabled.

03 SCORE

A pyiqa no-reference model runs under torch.inference_mode(), with AMP on CUDA. It handles scalar, vector, and [N, 1] output shapes and pins known score-direction overrides, returning one quality score per frame.

04 PERSIST

Scores are buffered and batch-inserted into SQLite with a unique constraint on (video, frame). WAL mode, NORMAL sync, an enlarged cache, and memory temp store keep sequential writes cheap and the DB easy to inspect.

05 SELECT

Selection flags top-tier frames (percentile or top-percentage) and diverse-coverage frames spread across the timeline, with temporal deduplication, deterministic tie handling, and small-video backfill.

06 EXPORT

FFmpeg select filters export the chosen frames in batches, validating the output count and falling back to single-frame export if it's wrong. Filenames embed source stem, frame number, rank, and score.

// The interesting part

CUDA throughput, and an export path that refuses to lie.

Two boundaries where real systems break: GPU inference and FFmpeg on disk.

Making the GPU earn its keep

Scoring is the hot loop, so the inference side is built to stay fed. The assessor queries free VRAM and computes a dynamic batch size from frame resolution, dtype size, and a model-overhead factor — capped by the configured ceiling — so a 4K source and a 1080p source don't both get the same conservative batch. AMP lowers memory pressure on CUDA, and pinned host memory plus non-blocking transfers keep the copy from stalling the device.

Inside the loop, all scoring runs under torch.inference_mode() to drop autograd overhead, and batched outputs are flattened once with a single detach().reshape(-1).float().cpu() instead of per-frame .item() calls — repeated .item() forces a sync every frame and quietly destroys throughput. The loop deliberately avoids routine torch.cuda.synchronize(), reserving explicit syncs for benchmarking only. One subtle bug worth flagging: a pyiqa metric wrapper can report training = True even when the underlying network is in eval mode, so the assessor explicitly calls eval() and pins known score-direction overrides — both locked down with tests.

Hardening the export path

Media export is the most failure-prone boundary in the whole pipeline, so it gets the most defensive code. FFmpeg select expressions are batched by a configurable size because a single giant select=eq(n\,123)+eq(n\,456)+... can blow past parser and command-length limits on Windows before producing anything. After each batch the exporter validates that exactly the expected number of files came out; if not, it deletes the partial temp files and falls back to one-frame-at-a-time export so the frame-number-to-file mapping is never ambiguous. There's a disk-space preflight that estimates final output bytes, peak temporary bytes, and a safety margin before anything is written, and GPU decode falls back to CPU when a single-frame export fails with acceleration enabled. Frame numbering is 0-indexed end to end — decoder, SQLite, and FFmpeg's n variable — which was verified with MD5 checks between batch and single-frame exports to rule out off-by-one errors.

A video has thousands of frames. You want the handful worth keeping.

Decode, score, persist, select, export.

CUDA throughput, and an export path that refuses to lie.

Making the GPU earn its keep

Hardening the export path

Fiber PM

design-doctor