Core Media Fundamentals & Next-Gen Formats

Every millisecond of Largest Contentful Paint delay and every kilobyte of unnecessary payload translates directly into lost conversions, higher infrastructure bills, and lower Core Web Vitals scores. Getting media delivery right means making precise decisions at every layer: which codec to encode with, which quantizer range to target, how HTTP headers instruct CDN caches to store format-negotiated variants, and how build pipelines automate the transcoding work at scale. This section covers the foundational theory and production patterns across all of those layers.


What this section covers

The topics below form the systematic architecture for next-gen media delivery. Each area is explored in depth across its own dedicated pages.

AVIF vs WebP Compression Benchmarks — SSIM/VMAF scoring methodology, file-size comparisons at matched perceptual quality, and the conditions under which AVIF’s superior compression justifies its slower encode time versus WebP’s broader decode support.

MIME Type Configuration for Modern Media Servers — Correct Content-Type registration for image/avif, image/webp, video/webm, and video/mp4; codecs=av01 on Nginx, Apache, and Caddy. Misconfigured MIME types cause browsers to silently discard <source> elements.

Cache-Control Headers for Image and Video Assetsmax-age, immutable, stale-while-revalidate, and the Vary: Accept directive that prevents CDNs from serving a cached WebP to an AVIF-capable client.

Understanding Video Codecs: VP9 vs H.265 vs AV1 — Codec lineage, hardware decoding matrices, licensing considerations, and encode-time vs compression tradeoffs for streaming and background video use cases.


Pipeline overview

The diagram below shows how format selection, HTTP negotiation, CDN caching, and client-side loading interact end-to-end. Each stage is a decision point where a wrong setting propagates downstream errors.

End-to-end media delivery pipeline A five-stage flow diagram: Source Asset → Build Pipeline (Sharp / FFmpeg) → Origin Server (Nginx + headers) → CDN Edge (Vary: Accept) → Browser Render (picture / srcset). Arrows connect each stage left to right. Source Asset JPEG / PNG MP4 / MOV encode Build Pipeline Sharp → AVIF/WebP FFmpeg → AV1/VP9 avifenc / cwebp content hash → URL deploy Origin Server Nginx / Caddy Content-Type headers Cache-Control rules Vary: Accept cache CDN Edge Cloudflare / Fastly Vary-aware cache key max-age=31536000 immutable directive serve Browser <picture> / srcset fetchpriority=high loading=lazy decoding=async ↑ slow encode = CI bottleneck ↑ wrong MIME = source dropped ↑ missing Vary = format poisoning

Core theory: codecs, compression, and HTTP semantics

Image codec fundamentals

AVIF — derived from the AV1 video codec — encodes still images using the same intra-frame prediction, transform, and entropy coding tools that make AV1 competitive with HEVC for video. The key encoding parameters are --min and --max (quantizer range, where 0 is lossless and 63 is maximum loss — inverted from most quality sliders), --speed (0–10, lower is slower but smaller), and --depth (8 or 10 bits per channel). At matched perceptual quality as measured by SSIM or VMAF, AVIF typically produces files 30–50% smaller than JPEG and 15–25% smaller than WebP.

WebP uses the VP8 intra-frame codec for lossy compression and a purpose-built lossless mode for graphics with sharp edges and flat regions. Its quality parameter (-q 0–100) runs in the conventional direction — higher equals better quality. WebP enjoys wider decode support than AVIF (every browser since Chrome 23/Firefox 65/Safari 14), making it the safer universal fallback before AVIF.

Chroma subsampling governs how colour information is sampled relative to luminance. 4:2:0 halves colour resolution in both dimensions, reducing file size by roughly 50% for chroma data with minimal visible impact on photographic content. 4:4:4 retains full colour fidelity, which matters for text overlaid on images, UI screenshots, and product imagery where colour accuracy is brand-critical. Specifying the wrong subsampling mode for UI graphics in avifenc produces visible chroma bleeding on high-contrast edges.

Video codec fundamentals

For background video, short clips, and streaming, codec selection involves three axes: compression efficiency, hardware decode availability, and licensing cost. H.264 (AVC) decodes in hardware on every device made in the last decade, carries no runtime royalty for free distribution, and is the universal baseline. VP9 — developed by Google as a royalty-free H.265 alternative — achieves roughly 30–50% better compression than H.264 at matched quality. AV1 extends that improvement by another 20–30% over VP9 but requires more complex decode logic; hardware AV1 decode is now standard on Apple Silicon, recent Intel Arc, AMD RDNA3, and Qualcomm Snapdragon 8 Gen series, but older mobile chips fall back to costly software decode. H.265 (HEVC) matches or exceeds AV1 compression on some encoders but carries mandatory patent royalties. Full codec analysis and hardware matrix: Understanding Video Codecs: VP9 vs H.265 vs AV1.

HTTP negotiation semantics

The Accept request header signals which image formats the browser supports. A Chrome request includes image/avif,image/webp,*/*;q=0.8. Servers and CDNs that serve format-negotiated content from a single URL must include Vary: Accept in the response so intermediate caches store separate variants per Accept value. Without it, a CDN that cached a WebP response will serve the WebP to the next AVIF-capable client, negating the format negotiation entirely.

Content-Type must match the actual bytes. Nginx does not auto-detect AVIF or WebM without explicit MIME type registration. A missing or wrong Content-Type causes Firefox and Safari to reject the resource silently, with no error in the console — only a missing image or blank video. Correct registration is covered in depth at MIME type configuration for modern media servers.


Reference data: compression and decode performance

The table below reflects typical results from encoding a 2 MP photographic JPEG at visual equivalence (SSIM ≈ 0.95). Decode times are measured on a 2021 MacBook Pro (M1) running Chrome 120; mobile figures (Pixel 7) are higher due to software-path AV1 decode on older SoCs.

Format Typical file size vs JPEG baseline SSIM at size Encode time (Sharp / libvips) Hardware decode Safari 14 Safari 16 Chrome 85+ Firefox 93+ Edge 18+
JPEG 280 KB 0.95 ~80 ms Universal Yes Yes Yes Yes Yes
WebP (q=80) 195 KB −30% 0.95 ~110 ms Partial (Chrome/Edge) Yes Yes Yes Yes Yes
AVIF (q=80 equiv) 155 KB −45% 0.95 ~900 ms Apple Silicon, Chrome 121+ No Yes Yes Yes Yes
JPEG XL 145 KB −48% 0.95 ~1200 ms None currently No No Experimental No No

Note on AVIF encode time: the ~900 ms figure uses avifenc --speed 6. Dropping to --speed 4 or lower increases encode time 3–5× for marginal quality gains. In CI/CD pipelines, set --speed 6 or --speed 8 for trunk builds and reserve slower presets for pre-release or CDN on-the-fly encoding.

Video format reference

Codec Container Bitrate vs H.264 HW decode (2024+) Royalty-free Safari 14 Chrome 85+ Firefox 93+ Edge 18+
H.264 (AVC) MP4 baseline Universal No (MPEG-LA) Yes Yes Yes Yes
VP9 WebM −30–50% Partial (no Apple) Yes No Yes Yes Yes
H.265 (HEVC) MP4 −40–50% Wide (incl. Apple) No (HEVC advance) Yes No No Partial
AV1 MP4/WebM −50–60% Apple Si, Intel 12+, Snapdragon 8 Gen 2+ Yes Yes (Safari 17+) Yes (Chrome 90+) Yes Yes

Canonical delivery pattern

The following snippet represents the production-standard format fallback chain for a Largest Contentful Paint image. Every attribute is annotated.

<picture>
  <!--
    AVIF source: highest compression, ~45% smaller than JPEG at matched quality.
    Served to Chrome 85+, Firefox 93+, Safari 16+.
    avifenc must register image/avif in Nginx — see MIME type guide.
  -->
  <source
    srcset="hero-480w.avif 480w, hero-800w.avif 800w, hero-1200w.avif 1200w"
    sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
    type="image/avif"
  >
  <!--
    WebP fallback: ~30% smaller than JPEG; supported in Safari 14+, all Chromium.
    Required for Safari 14/15 which lack AVIF support.
  -->
  <source
    srcset="hero-480w.webp 480w, hero-800w.webp 800w, hero-1200w.webp 1200w"
    sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
    type="image/webp"
  >
  <img
    src="hero-800w.jpg"
    srcset="hero-480w.jpg 480w, hero-800w.jpg 800w, hero-1200w.jpg 1200w"
    sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
    alt="Landscape hero image showing mountain ridge at sunrise"
    width="800"
    height="500"
    <!--
      fetchpriority=high: instructs the preload scanner to fetch this resource
      before lower-priority resources. Reserve for the primary LCP element only —
      applying to 2+ images triggers bandwidth contention with CSS/fonts.
    -->
    fetchpriority="high"
    <!--
      loading=eager is the default; stated explicitly to distinguish from
      below-fold images that use loading=lazy. Do NOT set loading=lazy on LCP.
    -->
    loading="eager"
    <!--
      decoding=async offloads image decode to a non-blocking thread.
      Has no effect on LCP timing but prevents jank during scroll.
    -->
    decoding="async"
  >
</picture>

Tradeoff: width and height attributes prevent Cumulative Layout Shift (CLS) by reserving space before the image loads. Without them, browsers reflow the page when the image dimensions are known, causing CLS scores to spike. The values must match the intrinsic size of the src fallback image.


Pipeline integration

Image transcoding at build time

Sharp (Node.js, backed by libvips) is the standard tool for build-time image conversion. It operates on streams and processes multiple output formats in a single decode pass, making it significantly faster than invoking avifenc and cwebp separately for every source image.

// scripts/optimize-images.mjs
import sharp from 'sharp';
import { glob } from 'glob';
import { basename, dirname } from 'path';

const sources = await glob('src/media/**/*.{jpg,png}');

for (const src of sources) {
  const dir  = dirname(src).replace('src/', 'dist/');
  const name = basename(src, /\.(jpg|png)$/.exec(src)[0]);
  const pipe = sharp(src);

  await pipe
    .clone()
    // quality=80 maps to AVIF quantizer ~28 internally.
    // effort=4 balances encode speed (0=fast, 9=slowest).
    // chromaSubsampling='4:2:0' is correct for photography;
    // use '4:4:4' for text-heavy UI screenshots.
    .avif({ quality: 80, effort: 4, chromaSubsampling: '4:2:0' })
    .toFile(`${dir}/${name}.avif`);

  await pipe
    .clone()
    // quality=82 for WebP is perceptually equivalent to AVIF q=80.
    .webp({ quality: 82, smartSubsample: true })
    .toFile(`${dir}/${name}.webp`);

  await pipe
    .clone()
    // JPEG fallback: mozjpeg encoder reduces file size ~15% vs libjpeg.
    .jpeg({ quality: 85, mozjpeg: true })
    .toFile(`${dir}/${name}.jpg`);
}

Video transcoding pipeline

For short video clips (background loops, product demos), a two-pass FFmpeg pipeline producing AV1 (MP4) and VP9 (WebM) covers the full browser matrix:

#!/usr/bin/env bash
# transcode.sh — two-format video pipeline for web delivery
INPUT="$1"
STEM="${INPUT%.*}"

# AV1 via SVT-AV1 encoder (libsvtav1). crf=30 targets ~60% smaller than H.264.
# preset=6 balances speed vs compression (0=slowest, 13=fastest).
# -movflags +faststart places the MP4 moov atom at the front for progressive play.
ffmpeg -i "$INPUT" \
  -c:v libsvtav1 -crf 30 -preset 6 \
  -c:a libopus -b:a 128k \
  -movflags +faststart \
  "${STEM}-av1.mp4"

# VP9 via libvpx-vp9. -b:v 0 enables constant quality mode (required for -crf).
# crf=33 for VP9 is roughly perceptually equivalent to AV1 crf=30.
# -deadline good / -cpu-used 4 balances quality vs CPU cost in CI.
ffmpeg -i "$INPUT" \
  -c:v libvpx-vp9 -b:v 0 -crf 33 \
  -deadline good -cpu-used 4 \
  -c:a libopus -b:a 128k \
  "${STEM}-vp9.webm"

Tradeoff: SVT-AV1 at preset=6 is approximately 3× slower than libvpx-vp9 for equivalent quality. For CI pipelines processing many videos, consider parallelising across CPU cores or offloading to a cloud transcoding API (AWS MediaConvert, Cloudflare Stream) for trunk builds, reserving local FFmpeg for development previews.

Nginx server configuration

Cache-Control headers and MIME type configuration work together in the server block. The critical pairing is Vary: Accept alongside Cache-Control: immutable:

# nginx.conf — next-gen media delivery block
# Ensure mime.types includes: image/avif avif; image/webp webp; video/webm webm;
# (avif and webm are NOT in Nginx's default mime.types before 1.21.x)

location ~* \.(avif|webp|jpg|jpeg|png|gif|mp4|webm)$ {
  # immutable: tells browser not to revalidate during max-age window.
  # Requires content-hashed URLs (e.g., hero.abc123.avif) so stale assets
  # are never served after a deploy.
  add_header Cache-Control "public, max-age=31536000, immutable";

  # Vary: Accept is mandatory when serving format-negotiated assets from a
  # single URL pattern. Without it, a CDN that cached a WebP response will
  # incorrectly serve WebP to an AVIF-capable browser (cache poisoning).
  add_header Vary "Accept";

  gzip off;       # AVIF/WebP/MP4/WebM are already compressed; gzip adds CPU with no benefit.
  expires 1y;     # HTTP/1.0 compatibility alias for max-age=31536000.
  try_files $uri =404;
}

Tradeoffs & failure modes

Failure mode Cause Fix
CDN serves WebP to AVIF-capable browser Missing Vary: Accept header Add Vary: Accept to every location block serving format-negotiated assets
AVIF silently rejected by browser Missing image/avif MIME type in Nginx Register image/avif avif; in mime.types or types {} block
LCP degrades after adding fetchpriority=high to hero Multiple elements have fetchpriority=high, starving CSS/font fetches Reserve fetchpriority=high for exactly one element per page — the primary LCP candidate
CLS spikes on image load Missing width/height attributes on <img> Always declare intrinsic dimensions; use CSS aspect-ratio as belt-and-suspenders
CI build time doubles after switching to AVIF Default avifenc uses --speed 4 or slower Set --speed 6 in CI; use --speed 8 for developer previews
WebM video blank in Safari 14/15 Safari did not support VP9/WebM until Safari 16 Provide <source type="video/mp4"> H.264 fallback; AV1/MP4 works in Safari 17+
avifenc --min/--max produce unexpectedly large files Quantizer scale is inverted: --min 0 --max 63 means lossless→worst, opposite of a quality slider For q≈80 quality, use --min 20 --max 40; the lower the numbers, the better the quality
Vary: Accept breaks CloudFront caching CloudFront by default does not forward or key on Accept Configure a Cache Policy that includes Accept in the cache key

Browser & CDN compatibility matrix

Image format support

Feature Safari 14 Safari 16 Chrome 85+ Firefox 93+ Edge 18+
WebP (lossy) Yes Yes Yes Yes Yes
WebP (lossless) Yes Yes Yes Yes Yes
AVIF (8-bit) No Yes (16.0+) Yes Yes Yes
AVIF (10-bit HDR) No Yes (16.4+) Yes Yes (partial) Yes
JPEG XL No No Experimental No No
<picture> element Yes Yes Yes Yes Yes
srcset + sizes Yes Yes Yes Yes Yes
fetchpriority attribute Yes (15.4+) Yes Yes (102+) Yes (132+) Yes (102+)

Video codec support

Codec Safari 14 Safari 16 Chrome 85+ Firefox 93+ Edge 18+
H.264 / MP4 Yes Yes Yes Yes Yes
VP9 / WebM No No Yes Yes Yes
H.265 / MP4 Yes (HW) Yes (HW) No No Partial
AV1 / MP4 No No Yes Yes Yes
AV1 / MP4 (Safari) Yes (17+)
WebM container No No Yes Yes Yes

CDN format negotiation support

Feature Cloudflare Fastly AWS CloudFront Nginx (self-hosted)
Vary: Accept respected in cache key Yes (automatic) Yes (with Vary enabled) Requires Cache Policy config Yes (default)
On-the-fly AVIF conversion Yes (Image Resizing) No (requires custom VCL) No (use Lambda@Edge) No (use libvips/Sharp)
Automatic WebP conversion Yes (Polish) No No No
Cache-Control: immutable honoured Yes Yes Yes Yes
Edge Accept header forwarding Yes Requires bereq.http.Accept Requires Cache Policy N/A

Performance measurement and debugging

Track LCP and decode performance with PerformanceObserver. Separating network transfer time from decode time reveals whether poor LCP is a bandwidth problem (large file, slow CDN) or a decode problem (software fallback, oversized image relative to display size):

// Monitor image resource timing to separate transfer vs decode cost.
// PerformanceObserver fires after the resource completes loading.
const observer = new PerformanceObserver((list) => {
  for (const entry of list.getEntries()) {
    if (entry.initiatorType !== 'img') continue;

    // responseEnd - responseStart = transfer time (network).
    // duration - (responseEnd - startTime) ≈ decode + compositing time.
    const transferMs = (entry.responseEnd - entry.responseStart).toFixed(1);
    const totalMs    = entry.duration.toFixed(1);
    const decodeMs   = (entry.duration - (entry.responseEnd - entry.startTime)).toFixed(1);

    console.table({
      url:      entry.name,
      totalMs,
      transferMs,
      decodeMs,
    });
  }
});
observer.observe({ type: 'resource', buffered: true });

Tradeoff: High decodeMs values (>50 ms on desktop) indicate the image is being software-decoded — either because hardware AVIF decode is unavailable (older Android) or the image is being decoded at a size larger than its display dimensions. Confirm decode path in Chrome by checking chrome://media-internals or enabling Image Decode in the Performance panel.

For server-side validation, use curl -sI to confirm headers without downloading the body:

# Confirm Content-Type and Vary headers for a negotiated image URL.
# -H 'Accept: image/avif,image/webp,*/*' simulates a Chrome request.
curl -sI -H 'Accept: image/avif,image/webp,*/*' \
  https://example.com/images/hero.jpg \
  | grep -iE 'content-type|vary|cache-control|x-cache'

Expected output:

content-type: image/avif
vary: Accept
cache-control: public, max-age=31536000, immutable
x-cache: HIT

If content-type: image/jpeg appears despite the Accept header, either the server is not performing format negotiation or the CDN returned a cached non-negotiated response.


Accessibility and progressive enhancement

Progressive enhancement requires that every format tier degrades gracefully: AVIF → WebP → JPEG for images; AV1/MP4 → VP9/WebM → H.264/MP4 for video. Accessibility overlays apply at every tier:

  • Every informative <img> requires a descriptive alt attribute. Decorative images (background textures, purely visual dividers) use alt="" and optionally aria-hidden="true" to remove them from the accessibility tree.
  • Animated media (GIF replacements, background loops) must respect @media (prefers-reduced-motion: reduce). Either pause the animation or replace it with a static poster image.
  • Video content that conveys information must include synchronized captions (WebVTT <track kind="captions">) and audio descriptions for visual-only content.
  • SVG diagrams embedded inline require <title>, <desc>, role="img", and aria-label on the root element. Interactive SVGs (zoom, toggle) require keyboard event handlers.