WebPageTest filmstrip diff automation for LCP
A byte budget can pass while the page still feels slower, because two builds can transfer identical bytes yet paint the hero in a different frame. WebPageTest closes that gap: it records a frame-by-frame filmstrip and computes Speed Index and LCP, so you can diff the visual progress of a candidate build against a stored baseline and see exactly which frame the media appears in. This guide, part of Monitoring & Regression for Media Delivery under CDN & Edge Media Delivery, shows how to drive the WebPageTest API from Node.js, pull the filmstrip and metrics, and turn a visual regression into a red CI check. It is the render-timeline complement to Lighthouse CI budget enforcement.
Prerequisite checklist
Warning: the public WebPageTest API is rate-limited and runs on shared agents whose queue depth varies through the day. For a gate that runs on every PR, use a private/self-hosted WPT instance or a paid API tier; otherwise queue time makes the job flap on timeouts rather than on real regressions.
What the filmstrip tells you that a budget can’t
Speed Index is the average time at which visible parts of the page are painted — it integrates the filmstrip’s visual-completeness curve over time. A lower Speed Index means content appears earlier. LCP marks the single frame where the largest element (usually the hero image) finishes painting. Diffing these between builds catches regressions that leave byte weight untouched: a hero that lost its fetchpriority hint, a lazy-loading attribute that crept onto the LCP image, or a render-blocking script inserted above it. None of those change total image bytes, but all of them push the hero’s paint into a later frame.
Exact solution
Step 1 — Submit a test and poll for completion
WebPageTest’s REST API is two calls: runtest.php submits and returns a testId; jsonResult.php returns the full result once the run finishes. Request multiple runs so you can take the median.
// wpt-run.mjs — submit a WebPageTest run and resolve with its JSON result.
const WPT_HOST = 'https://www.webpagetest.org';
const API_KEY = process.env.WPT_API_KEY;
export async function runTest(url) {
const params = new URLSearchParams({
url,
k: API_KEY,
f: 'json', // return JSON, not the HTML result page
runs: '5', // FIVE runs so we can take a stable median
fvonly: '1', // first-view only; repeat-view doubles cost, not needed for LCP
location: 'Dulles:Chrome', // PIN the location — a different agent = different numbers
connectivity: '4G', // PIN the network profile; must match the baseline exactly
lighthouse: '1', // also compute Lighthouse metrics (gives a clean LCP)
video: '1', // REQUIRED to get filmstrip frames back
});
// Submit the test.
const submit = await fetch(`${WPT_HOST}/runtest.php?${params}`).then(r => r.json());
if (submit.statusCode !== 200) {
throw new Error(`WPT submit failed: ${submit.statusText}`);
}
const { testId, jsonUrl } = submit.data;
// Poll jsonResult until statusCode 200 (complete). 100/101 = queued/running.
for (let attempt = 0; attempt < 60; attempt++) {
const res = await fetch(jsonUrl).then(r => r.json());
if (res.statusCode === 200) return res.data; // done
if (res.statusCode >= 400) throw new Error(res.statusText);
await new Promise(r => setTimeout(r, 5000)); // wait 5s between polls
}
throw new Error(`WPT test ${testId} did not complete in time`);
}
Step 2 — Extract the median metrics and filmstrip
WebPageTest already nominates a median run (by Speed Index by default). Pull its metrics and the filmstrip frames from the result tree.
// wpt-extract.mjs — reduce a raw WPT result to the numbers we gate on.
export function extractMetrics(data) {
// data.median.firstView is the run WPT chose as representative.
const fv = data.median.firstView;
return {
testId: data.testId,
// Speed Index (ms) — integral of visual completeness; lower is better.
speedIndex: fv.SpeedIndex,
// LCP (ms). WebPageTest exposes it as chromeUserTiming.LargestContentfulPaint
// or as the lighthouse LCP; prefer the Chrome trace value when present.
lcp: fv['chromeUserTiming.LargestContentfulPaint']
?? fv['lighthouse.LargestContentfulPaint'],
// Visual Complete (ms) — when the above-fold stopped changing.
visualComplete: fv.visualComplete,
// Filmstrip: array of { time, image, VisuallyComplete } frames.
frames: (fv.videoFrames ?? []).map(f => ({
timeMs: f.time,
progress: f.VisuallyComplete, // 0–100 visual completeness at this frame
image: f.image, // URL of the frame thumbnail
})),
};
}
Step 3 — Diff candidate against baseline
Compare medians and flag the frame where visual progress diverges. A tolerance absorbs normal agent noise; anything beyond it is a real regression.
// wpt-diff.mjs — compare a candidate run to a stored baseline.
export function diffRuns(baseline, candidate, tolerance = { si: 0.10, lcp: 0.10 }) {
const siDelta = (candidate.speedIndex - baseline.speedIndex) / baseline.speedIndex;
const lcpDelta = (candidate.lcp - baseline.lcp) / baseline.lcp;
// Find the first frame where the candidate is >5pp less visually complete
// than the baseline at the same timestamp — that's where the hero slipped.
let firstRegressionFrame = null;
for (const base of baseline.frames) {
const cand = candidate.frames.find(f => f.timeMs === base.timeMs);
if (cand && base.progress - cand.progress > 5) {
firstRegressionFrame = { timeMs: base.timeMs, baseline: base.progress, candidate: cand.progress };
break;
}
}
const regressed = siDelta > tolerance.si || lcpDelta > tolerance.lcp;
return { regressed, siDelta, lcpDelta, firstRegressionFrame };
}
Step 4 — Wire it into CI
# .github/workflows/wpt-filmstrip.yml
name: WebPageTest filmstrip diff
on:
pull_request:
branches: [main]
jobs:
filmstrip:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with: { node-version: 20 }
- name: Diff candidate deploy against production baseline
run: node ci/wpt-gate.mjs
env:
WPT_API_KEY: ${{ secrets.WPT_API_KEY }}
# PREVIEW_URL is the PR's deploy; BASELINE_URL is current production.
# Both must resolve publicly so WPT agents can reach them.
PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
BASELINE_URL: https://media-delivery.com/products/
// ci/wpt-gate.mjs — the glue script the job runs.
import { runTest } from './wpt-run.mjs';
import { extractMetrics } from './wpt-extract.mjs';
import { diffRuns } from './wpt-diff.mjs';
const baseline = extractMetrics(await runTest(process.env.BASELINE_URL));
const candidate = extractMetrics(await runTest(process.env.PREVIEW_URL));
const diff = diffRuns(baseline, candidate);
console.log(`Speed Index: ${baseline.speedIndex} -> ${candidate.speedIndex} (${(diff.siDelta * 100).toFixed(1)}%)`);
console.log(`LCP: ${baseline.lcp} -> ${candidate.lcp} (${(diff.lcpDelta * 100).toFixed(1)}%)`);
if (diff.firstRegressionFrame) {
console.log(`First slipped frame at ${diff.firstRegressionFrame.timeMs}ms:`,
`${diff.firstRegressionFrame.baseline}% -> ${diff.firstRegressionFrame.candidate}%`);
}
// Non-zero exit fails the check when the candidate regressed beyond tolerance.
process.exit(diff.regressed ? 1 : 0);
Verification steps
1. Confirm the API returns a median and filmstrip
# Submit a one-off run and confirm the result contains videoFrames.
# jq should print a non-empty array of frame timestamps.
curl -s "https://www.webpagetest.org/runtest.php?url=https%3A%2F%2Fmedia-delivery.com%2F&k=$WPT_API_KEY&f=json&runs=3&video=1&location=Dulles:Chrome&connectivity=4G" \
| jq -r '.data.testId'
2. Prove the gate catches a real regression
Deploy a preview with loading="lazy" deliberately added to the LCP image and run the gate against it. The Speed Index and LCP deltas must exceed tolerance and the job must exit non-zero, with firstRegressionFrame naming the frame where the hero slipped. If it passes, your tolerance is too loose or the baseline and candidate used different location/connectivity values.
3. Confirm baseline and candidate used identical conditions
# Both result JSONs must report the same location and connectivity, or the
# diff is meaningless. Compare the two testInfo blocks.
jq '.data.testInfo | {location, connectivity}' baseline.json candidate.json
Expected metric deltas for a healthy build:
| Metric | Acceptable delta vs baseline | Interpretation |
|---|---|---|
| Speed Index | within ±10% | agent noise, not a regression |
| LCP | within ±10% | hero paints at the same frame |
| First slipped frame | none | no frame lost visual progress |
| Visual Complete | within ±10% | above-fold stabilizes at the same time |
Common mistakes and fixes
1. Trusting a single run
Anti-pattern: submitting runs=1 and comparing that number to the baseline.
Effect: WebPageTest agents are shared and their timings vary run-to-run by 10–20%. A single run diffs the noise, not the build, so the gate flaps green and red on identical code.
Fix: request runs=5 and compare the median run WebPageTest selects (or compute your own median). This mirrors the median-of-N rule used in Lighthouse CI budget enforcement and is non-negotiable for a stable gate.
2. Inconsistent location or connection profile
Anti-pattern: the baseline ran on Dulles:Chrome / Cable weeks ago; the candidate runs on London:Chrome / 4G today.
Effect: the diff conflates a network/geography change with a code change. A “regression” that is really just a slower connection profile blocks a clean PR.
Fix: pin location and connectivity as constants shared by both runs, assert they match (Verification step 3), and re-baseline whenever you deliberately change them. Latency and bandwidth dominate LCP, so they must be held constant.
3. Omitting video=1
Anti-pattern: requesting metrics without the video flag.
Effect: WebPageTest returns Speed Index and LCP but no videoFrames, so the filmstrip diff has nothing to compare and firstRegressionFrame is always null — you lose the visual evidence that is the whole point.
Fix: always pass video=1. It is what generates the frame captures the filmstrip diff reads.
4. Stale or absent baseline
Anti-pattern: comparing every candidate to a hard-coded baseline JSON committed a year ago.
Effect: slow drift accumulates undetected, or every PR looks like a huge regression because the baseline predates a legitimate redesign.
Fix: refresh the baseline from current production on a schedule (e.g. nightly) and store it as an artifact keyed by the production commit. The baseline should track the last-known-good production build, not a frozen snapshot.
5. Gating on LCP alone
Anti-pattern: asserting only on LCP and ignoring Speed Index and the frame diff.
Effect: a regression that delays secondary above-fold media (a second image, a background) without moving the single largest element passes, even though the page visibly loads slower.
Fix: gate on Speed Index and LCP and the first-slipped-frame check. LCP is one element; Speed Index and the filmstrip capture the whole above-fold render. For the field-side confirmation of an LCP regression, pair this with tracking LCP field data with the CrUX API.
Tradeoff: filmstrip runs are far slower and costlier than a Lighthouse budget — five runs across two URLs, each queued on a shared agent, can take minutes. Reserve filmstrip diffing for the handful of templates where render timing is the product (landing pages, article heroes) and let the cheaper byte budget guard the rest.
Related
- Monitoring & Regression for Media Delivery — where filmstrip diffing sits in the full lab-and-field loop
- Lighthouse CI Budget Enforcement for Image Weight — the cheaper byte-budget gate that runs alongside this
- Tracking LCP Field Data with the CrUX API — confirming a lab LCP regression against real-user data
- Using fetchpriority to optimize critical media — the fix when the filmstrip shows the hero painting late
- AVIF vs WebP Compression Benchmarks — format choices that move the hero’s paint frame earlier