WebPageTest filmstrip diff automation for LCP

A byte budget can pass while the page still feels slower, because two builds can transfer identical bytes yet paint the hero in a different frame. WebPageTest closes that gap: it records a frame-by-frame filmstrip and computes Speed Index and LCP, so you can diff the visual progress of a candidate build against a stored baseline and see exactly which frame the media appears in. This guide, part of Monitoring & Regression for Media Delivery under CDN & Edge Media Delivery, shows how to drive the WebPageTest API from Node.js, pull the filmstrip and metrics, and turn a visual regression into a red CI check. It is the render-timeline complement to Lighthouse CI budget enforcement.

Prerequisite checklist

A WebPageTest API key (from the public instance at webpagetest.org or a self-hosted / private WPT install)
A fixed test location and connection profile chosen and written down — e.g. Dulles:Chrome on 4G — because location and network directly determine the numbers
A publicly reachable URL for both the baseline (current production) and the candidate (a preview deploy of the PR)
Node 18+ (the examples use the built-in fetch; no HTTP library needed)
Somewhere to store the baseline result JSON between runs (a committed fixture, an artifact bucket, or the last production run’s testId)

Warning: the public WebPageTest API is rate-limited and runs on shared agents whose queue depth varies through the day. For a gate that runs on every PR, use a private/self-hosted WPT instance or a paid API tier; otherwise queue time makes the job flap on timeouts rather than on real regressions.

What the filmstrip tells you that a budget can’t

Speed Index is the average time at which visible parts of the page are painted — it integrates the filmstrip’s visual-completeness curve over time. A lower Speed Index means content appears earlier. LCP marks the single frame where the largest element (usually the hero image) finishes painting. Diffing these between builds catches regressions that leave byte weight untouched: a hero that lost its fetchpriority hint, a lazy-loading attribute that crept onto the LCP image, or a render-blocking script inserted above it. None of those change total image bytes, but all of them push the hero’s paint into a later frame.

Exact solution

Step 1 — Submit a test and poll for completion

WebPageTest’s REST API is two calls: runtest.php submits and returns a testId; jsonResult.php returns the full result once the run finishes. Request multiple runs so you can take the median.

// wpt-run.mjs — submit a WebPageTest run and resolve with its JSON result.
const WPT_HOST = 'https://www.webpagetest.org';
const API_KEY = process.env.WPT_API_KEY;

export async function runTest(url) {
  const params = new URLSearchParams({
    url,
    k: API_KEY,
    f: 'json',                 // return JSON, not the HTML result page
    runs: '5',                 // FIVE runs so we can take a stable median
    fvonly: '1',               // first-view only; repeat-view doubles cost, not needed for LCP
    location: 'Dulles:Chrome', // PIN the location — a different agent = different numbers
    connectivity: '4G',        // PIN the network profile; must match the baseline exactly
    lighthouse: '1',           // also compute Lighthouse metrics (gives a clean LCP)
    video: '1',                // REQUIRED to get filmstrip frames back
  });

  // Submit the test.
  const submit = await fetch(`${WPT_HOST}/runtest.php?${params}`).then(r => r.json());
  if (submit.statusCode !== 200) {
    throw new Error(`WPT submit failed: ${submit.statusText}`);
  }
  const { testId, jsonUrl } = submit.data;

  // Poll jsonResult until statusCode 200 (complete). 100/101 = queued/running.
  for (let attempt = 0; attempt < 60; attempt++) {
    const res = await fetch(jsonUrl).then(r => r.json());
    if (res.statusCode === 200) return res.data;        // done
    if (res.statusCode >= 400) throw new Error(res.statusText);
    await new Promise(r => setTimeout(r, 5000));         // wait 5s between polls
  }
  throw new Error(`WPT test ${testId} did not complete in time`);
}

Step 2 — Extract the median metrics and filmstrip

WebPageTest already nominates a median run (by Speed Index by default). Pull its metrics and the filmstrip frames from the result tree.

// wpt-extract.mjs — reduce a raw WPT result to the numbers we gate on.
export function extractMetrics(data) {
  // data.median.firstView is the run WPT chose as representative.
  const fv = data.median.firstView;

  return {
    testId: data.testId,
    // Speed Index (ms) — integral of visual completeness; lower is better.
    speedIndex: fv.SpeedIndex,
    // LCP (ms). WebPageTest exposes it as chromeUserTiming.LargestContentfulPaint
    // or as the lighthouse LCP; prefer the Chrome trace value when present.
    lcp: fv['chromeUserTiming.LargestContentfulPaint']
         ?? fv['lighthouse.LargestContentfulPaint'],
    // Visual Complete (ms) — when the above-fold stopped changing.
    visualComplete: fv.visualComplete,
    // Filmstrip: array of { time, image, VisuallyComplete } frames.
    frames: (fv.videoFrames ?? []).map(f => ({
      timeMs: f.time,
      progress: f.VisuallyComplete,   // 0–100 visual completeness at this frame
      image: f.image,                 // URL of the frame thumbnail
    })),
  };
}

Step 3 — Diff candidate against baseline

Compare medians and flag the frame where visual progress diverges. A tolerance absorbs normal agent noise; anything beyond it is a real regression.

// wpt-diff.mjs — compare a candidate run to a stored baseline.
export function diffRuns(baseline, candidate, tolerance = { si: 0.10, lcp: 0.10 }) {
  const siDelta = (candidate.speedIndex - baseline.speedIndex) / baseline.speedIndex;
  const lcpDelta = (candidate.lcp - baseline.lcp) / baseline.lcp;

  // Find the first frame where the candidate is >5pp less visually complete
  // than the baseline at the same timestamp — that's where the hero slipped.
  let firstRegressionFrame = null;
  for (const base of baseline.frames) {
    const cand = candidate.frames.find(f => f.timeMs === base.timeMs);
    if (cand && base.progress - cand.progress > 5) {
      firstRegressionFrame = { timeMs: base.timeMs, baseline: base.progress, candidate: cand.progress };
      break;
    }
  }

  const regressed = siDelta > tolerance.si || lcpDelta > tolerance.lcp;
  return { regressed, siDelta, lcpDelta, firstRegressionFrame };
}

Step 4 — Wire it into CI

# .github/workflows/wpt-filmstrip.yml
name: WebPageTest filmstrip diff
on:
  pull_request:
    branches: [main]

jobs:
  filmstrip:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with: { node-version: 20 }

      - name: Diff candidate deploy against production baseline
        run: node ci/wpt-gate.mjs
        env:
          WPT_API_KEY: ${{ secrets.WPT_API_KEY }}
          # PREVIEW_URL is the PR's deploy; BASELINE_URL is current production.
          # Both must resolve publicly so WPT agents can reach them.
          PREVIEW_URL: ${{ steps.deploy.outputs.preview-url }}
          BASELINE_URL: https://media-delivery.com/products/

// ci/wpt-gate.mjs — the glue script the job runs.
import { runTest } from './wpt-run.mjs';
import { extractMetrics } from './wpt-extract.mjs';
import { diffRuns } from './wpt-diff.mjs';

const baseline = extractMetrics(await runTest(process.env.BASELINE_URL));
const candidate = extractMetrics(await runTest(process.env.PREVIEW_URL));
const diff = diffRuns(baseline, candidate);

console.log(`Speed Index: ${baseline.speedIndex} -> ${candidate.speedIndex} (${(diff.siDelta * 100).toFixed(1)}%)`);
console.log(`LCP:         ${baseline.lcp} -> ${candidate.lcp} (${(diff.lcpDelta * 100).toFixed(1)}%)`);
if (diff.firstRegressionFrame) {
  console.log(`First slipped frame at ${diff.firstRegressionFrame.timeMs}ms:`,
    `${diff.firstRegressionFrame.baseline}% -> ${diff.firstRegressionFrame.candidate}%`);
}
// Non-zero exit fails the check when the candidate regressed beyond tolerance.
process.exit(diff.regressed ? 1 : 0);

Verification steps

1. Confirm the API returns a median and filmstrip

# Submit a one-off run and confirm the result contains videoFrames.
# jq should print a non-empty array of frame timestamps.
curl -s "https://www.webpagetest.org/runtest.php?url=https%3A%2F%2Fmedia-delivery.com%2F&k=$WPT_API_KEY&f=json&runs=3&video=1&location=Dulles:Chrome&connectivity=4G" \
  | jq -r '.data.testId'

2. Prove the gate catches a real regression

Deploy a preview with loading="lazy" deliberately added to the LCP image and run the gate against it. The Speed Index and LCP deltas must exceed tolerance and the job must exit non-zero, with firstRegressionFrame naming the frame where the hero slipped. If it passes, your tolerance is too loose or the baseline and candidate used different location/connectivity values.

3. Confirm baseline and candidate used identical conditions

# Both result JSONs must report the same location and connectivity, or the
# diff is meaningless. Compare the two testInfo blocks.
jq '.data.testInfo | {location, connectivity}' baseline.json candidate.json

Expected metric deltas for a healthy build:

Metric	Acceptable delta vs baseline	Interpretation
Speed Index	within ±10%	agent noise, not a regression
LCP	within ±10%	hero paints at the same frame
First slipped frame	none	no frame lost visual progress
Visual Complete	within ±10%	above-fold stabilizes at the same time

Common mistakes and fixes

1. Trusting a single run

Anti-pattern: submitting runs=1 and comparing that number to the baseline.

Effect: WebPageTest agents are shared and their timings vary run-to-run by 10–20%. A single run diffs the noise, not the build, so the gate flaps green and red on identical code.

Fix: request runs=5 and compare the median run WebPageTest selects (or compute your own median). This mirrors the median-of-N rule used in Lighthouse CI budget enforcement and is non-negotiable for a stable gate.

2. Inconsistent location or connection profile

Anti-pattern: the baseline ran on Dulles:Chrome / Cable weeks ago; the candidate runs on London:Chrome / 4G today.

Effect: the diff conflates a network/geography change with a code change. A “regression” that is really just a slower connection profile blocks a clean PR.

Fix: pin location and connectivity as constants shared by both runs, assert they match (Verification step 3), and re-baseline whenever you deliberately change them. Latency and bandwidth dominate LCP, so they must be held constant.

3. Omitting `video=1`

Anti-pattern: requesting metrics without the video flag.

Effect: WebPageTest returns Speed Index and LCP but no videoFrames, so the filmstrip diff has nothing to compare and firstRegressionFrame is always null — you lose the visual evidence that is the whole point.

Fix: always pass video=1. It is what generates the frame captures the filmstrip diff reads.

4. Stale or absent baseline

Anti-pattern: comparing every candidate to a hard-coded baseline JSON committed a year ago.

Effect: slow drift accumulates undetected, or every PR looks like a huge regression because the baseline predates a legitimate redesign.

Fix: refresh the baseline from current production on a schedule (e.g. nightly) and store it as an artifact keyed by the production commit. The baseline should track the last-known-good production build, not a frozen snapshot.

5. Gating on LCP alone

Anti-pattern: asserting only on LCP and ignoring Speed Index and the frame diff.

Effect: a regression that delays secondary above-fold media (a second image, a background) without moving the single largest element passes, even though the page visibly loads slower.

Fix: gate on Speed Index and LCP and the first-slipped-frame check. LCP is one element; Speed Index and the filmstrip capture the whole above-fold render. For the field-side confirmation of an LCP regression, pair this with tracking LCP field data with the CrUX API.

Tradeoff: filmstrip runs are far slower and costlier than a Lighthouse budget — five runs across two URLs, each queued on a shared agent, can take minutes. Reserve filmstrip diffing for the handful of templates where render timing is the product (landing pages, article heroes) and let the cheaper byte budget guard the rest.

Monitoring & Regression for Media Delivery — where filmstrip diffing sits in the full lab-and-field loop
Lighthouse CI Budget Enforcement for Image Weight — the cheaper byte-budget gate that runs alongside this
Tracking LCP Field Data with the CrUX API — confirming a lab LCP regression against real-user data
Using fetchpriority to optimize critical media — the fix when the filmstrip shows the hero painting late
AVIF vs WebP Compression Benchmarks — format choices that move the hero’s paint frame earlier