Core Media Fundamentals & Next-Gen Formats
Every millisecond of Largest Contentful Paint delay and every kilobyte of unnecessary payload translates directly into lost conversions, higher infrastructure bills, and lower Core Web Vitals scores. Getting media delivery right means making precise decisions at every layer: which codec to encode with, which quantizer range to target, how HTTP headers instruct CDN caches to store format-negotiated variants, and how build pipelines automate the transcoding work at scale. This section covers the foundational theory and production patterns across all of those layers.
What this section covers
The topics below form the systematic architecture for next-gen media delivery. Each area is explored in depth across its own dedicated pages.
AVIF vs WebP Compression Benchmarks — SSIM/VMAF scoring methodology, file-size comparisons at matched perceptual quality, and the conditions under which AVIF’s superior compression justifies its slower encode time versus WebP’s broader decode support.
MIME Type Configuration for Modern Media Servers — Correct Content-Type registration for image/avif, image/webp, video/webm, and video/mp4; codecs=av01 on Nginx, Apache, and Caddy. Misconfigured MIME types cause browsers to silently discard <source> elements.
Cache-Control Headers for Image and Video Assets — max-age, immutable, stale-while-revalidate, and the Vary: Accept directive that prevents CDNs from serving a cached WebP to an AVIF-capable client.
Understanding Video Codecs: VP9 vs H.265 vs AV1 — Codec lineage, hardware decoding matrices, licensing considerations, and encode-time vs compression tradeoffs for streaming and background video use cases.
Pipeline overview
The diagram below shows how format selection, HTTP negotiation, CDN caching, and client-side loading interact end-to-end. Each stage is a decision point where a wrong setting propagates downstream errors.
Core theory: codecs, compression, and HTTP semantics
Image codec fundamentals
AVIF — derived from the AV1 video codec — encodes still images using the same intra-frame prediction, transform, and entropy coding tools that make AV1 competitive with HEVC for video. The key encoding parameters are --min and --max (quantizer range, where 0 is lossless and 63 is maximum loss — inverted from most quality sliders), --speed (0–10, lower is slower but smaller), and --depth (8 or 10 bits per channel). At matched perceptual quality as measured by SSIM or VMAF, AVIF typically produces files 30–50% smaller than JPEG and 15–25% smaller than WebP.
WebP uses the VP8 intra-frame codec for lossy compression and a purpose-built lossless mode for graphics with sharp edges and flat regions. Its quality parameter (-q 0–100) runs in the conventional direction — higher equals better quality. WebP enjoys wider decode support than AVIF (every browser since Chrome 23/Firefox 65/Safari 14), making it the safer universal fallback before AVIF.
Chroma subsampling governs how colour information is sampled relative to luminance. 4:2:0 halves colour resolution in both dimensions, reducing file size by roughly 50% for chroma data with minimal visible impact on photographic content. 4:4:4 retains full colour fidelity, which matters for text overlaid on images, UI screenshots, and product imagery where colour accuracy is brand-critical. Specifying the wrong subsampling mode for UI graphics in avifenc produces visible chroma bleeding on high-contrast edges.
Video codec fundamentals
For background video, short clips, and streaming, codec selection involves three axes: compression efficiency, hardware decode availability, and licensing cost. H.264 (AVC) decodes in hardware on every device made in the last decade, carries no runtime royalty for free distribution, and is the universal baseline. VP9 — developed by Google as a royalty-free H.265 alternative — achieves roughly 30–50% better compression than H.264 at matched quality. AV1 extends that improvement by another 20–30% over VP9 but requires more complex decode logic; hardware AV1 decode is now standard on Apple Silicon, recent Intel Arc, AMD RDNA3, and Qualcomm Snapdragon 8 Gen series, but older mobile chips fall back to costly software decode. H.265 (HEVC) matches or exceeds AV1 compression on some encoders but carries mandatory patent royalties. Full codec analysis and hardware matrix: Understanding Video Codecs: VP9 vs H.265 vs AV1.
HTTP negotiation semantics
The Accept request header signals which image formats the browser supports. A Chrome request includes image/avif,image/webp,*/*;q=0.8. Servers and CDNs that serve format-negotiated content from a single URL must include Vary: Accept in the response so intermediate caches store separate variants per Accept value. Without it, a CDN that cached a WebP response will serve the WebP to the next AVIF-capable client, negating the format negotiation entirely.
Content-Type must match the actual bytes. Nginx does not auto-detect AVIF or WebM without explicit MIME type registration. A missing or wrong Content-Type causes Firefox and Safari to reject the resource silently, with no error in the console — only a missing image or blank video. Correct registration is covered in depth at MIME type configuration for modern media servers.
Reference data: compression and decode performance
The table below reflects typical results from encoding a 2 MP photographic JPEG at visual equivalence (SSIM ≈ 0.95). Decode times are measured on a 2021 MacBook Pro (M1) running Chrome 120; mobile figures (Pixel 7) are higher due to software-path AV1 decode on older SoCs.
| Format | Typical file size | vs JPEG baseline | SSIM at size | Encode time (Sharp / libvips) | Hardware decode | Safari 14 | Safari 16 | Chrome 85+ | Firefox 93+ | Edge 18+ |
|---|---|---|---|---|---|---|---|---|---|---|
| JPEG | 280 KB | — | 0.95 | ~80 ms | Universal | Yes | Yes | Yes | Yes | Yes |
| WebP (q=80) | 195 KB | −30% | 0.95 | ~110 ms | Partial (Chrome/Edge) | Yes | Yes | Yes | Yes | Yes |
| AVIF (q=80 equiv) | 155 KB | −45% | 0.95 | ~900 ms | Apple Silicon, Chrome 121+ | No | Yes | Yes | Yes | Yes |
| JPEG XL | 145 KB | −48% | 0.95 | ~1200 ms | None currently | No | No | Experimental | No | No |
Note on AVIF encode time: the ~900 ms figure uses avifenc --speed 6. Dropping to --speed 4 or lower increases encode time 3–5× for marginal quality gains. In CI/CD pipelines, set --speed 6 or --speed 8 for trunk builds and reserve slower presets for pre-release or CDN on-the-fly encoding.
Video format reference
| Codec | Container | Bitrate vs H.264 | HW decode (2024+) | Royalty-free | Safari 14 | Chrome 85+ | Firefox 93+ | Edge 18+ |
|---|---|---|---|---|---|---|---|---|
| H.264 (AVC) | MP4 | baseline | Universal | No (MPEG-LA) | Yes | Yes | Yes | Yes |
| VP9 | WebM | −30–50% | Partial (no Apple) | Yes | No | Yes | Yes | Yes |
| H.265 (HEVC) | MP4 | −40–50% | Wide (incl. Apple) | No (HEVC advance) | Yes | No | No | Partial |
| AV1 | MP4/WebM | −50–60% | Apple Si, Intel 12+, Snapdragon 8 Gen 2+ | Yes | Yes (Safari 17+) | Yes (Chrome 90+) | Yes | Yes |
Canonical delivery pattern
The following snippet represents the production-standard format fallback chain for a Largest Contentful Paint image. Every attribute is annotated.
<picture>
<!--
AVIF source: highest compression, ~45% smaller than JPEG at matched quality.
Served to Chrome 85+, Firefox 93+, Safari 16+.
avifenc must register image/avif in Nginx — see MIME type guide.
-->
<source
srcset="hero-480w.avif 480w, hero-800w.avif 800w, hero-1200w.avif 1200w"
sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
type="image/avif"
>
<!--
WebP fallback: ~30% smaller than JPEG; supported in Safari 14+, all Chromium.
Required for Safari 14/15 which lack AVIF support.
-->
<source
srcset="hero-480w.webp 480w, hero-800w.webp 800w, hero-1200w.webp 1200w"
sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
type="image/webp"
>
<img
src="hero-800w.jpg"
srcset="hero-480w.jpg 480w, hero-800w.jpg 800w, hero-1200w.jpg 1200w"
sizes="(max-width: 600px) 100vw, (max-width: 1200px) 50vw, 800px"
alt="Landscape hero image showing mountain ridge at sunrise"
width="800"
height="500"
<!--
fetchpriority=high: instructs the preload scanner to fetch this resource
before lower-priority resources. Reserve for the primary LCP element only —
applying to 2+ images triggers bandwidth contention with CSS/fonts.
-->
fetchpriority="high"
<!--
loading=eager is the default; stated explicitly to distinguish from
below-fold images that use loading=lazy. Do NOT set loading=lazy on LCP.
-->
loading="eager"
<!--
decoding=async offloads image decode to a non-blocking thread.
Has no effect on LCP timing but prevents jank during scroll.
-->
decoding="async"
>
</picture>
Tradeoff: width and height attributes prevent Cumulative Layout Shift (CLS) by reserving space before the image loads. Without them, browsers reflow the page when the image dimensions are known, causing CLS scores to spike. The values must match the intrinsic size of the src fallback image.
Pipeline integration
Image transcoding at build time
Sharp (Node.js, backed by libvips) is the standard tool for build-time image conversion. It operates on streams and processes multiple output formats in a single decode pass, making it significantly faster than invoking avifenc and cwebp separately for every source image.
// scripts/optimize-images.mjs
import sharp from 'sharp';
import { glob } from 'glob';
import { basename, dirname } from 'path';
const sources = await glob('src/media/**/*.{jpg,png}');
for (const src of sources) {
const dir = dirname(src).replace('src/', 'dist/');
const name = basename(src, /\.(jpg|png)$/.exec(src)[0]);
const pipe = sharp(src);
await pipe
.clone()
// quality=80 maps to AVIF quantizer ~28 internally.
// effort=4 balances encode speed (0=fast, 9=slowest).
// chromaSubsampling='4:2:0' is correct for photography;
// use '4:4:4' for text-heavy UI screenshots.
.avif({ quality: 80, effort: 4, chromaSubsampling: '4:2:0' })
.toFile(`${dir}/${name}.avif`);
await pipe
.clone()
// quality=82 for WebP is perceptually equivalent to AVIF q=80.
.webp({ quality: 82, smartSubsample: true })
.toFile(`${dir}/${name}.webp`);
await pipe
.clone()
// JPEG fallback: mozjpeg encoder reduces file size ~15% vs libjpeg.
.jpeg({ quality: 85, mozjpeg: true })
.toFile(`${dir}/${name}.jpg`);
}
Video transcoding pipeline
For short video clips (background loops, product demos), a two-pass FFmpeg pipeline producing AV1 (MP4) and VP9 (WebM) covers the full browser matrix:
#!/usr/bin/env bash
# transcode.sh — two-format video pipeline for web delivery
INPUT="$1"
STEM="${INPUT%.*}"
# AV1 via SVT-AV1 encoder (libsvtav1). crf=30 targets ~60% smaller than H.264.
# preset=6 balances speed vs compression (0=slowest, 13=fastest).
# -movflags +faststart places the MP4 moov atom at the front for progressive play.
ffmpeg -i "$INPUT" \
-c:v libsvtav1 -crf 30 -preset 6 \
-c:a libopus -b:a 128k \
-movflags +faststart \
"${STEM}-av1.mp4"
# VP9 via libvpx-vp9. -b:v 0 enables constant quality mode (required for -crf).
# crf=33 for VP9 is roughly perceptually equivalent to AV1 crf=30.
# -deadline good / -cpu-used 4 balances quality vs CPU cost in CI.
ffmpeg -i "$INPUT" \
-c:v libvpx-vp9 -b:v 0 -crf 33 \
-deadline good -cpu-used 4 \
-c:a libopus -b:a 128k \
"${STEM}-vp9.webm"
Tradeoff: SVT-AV1 at preset=6 is approximately 3× slower than libvpx-vp9 for equivalent quality. For CI pipelines processing many videos, consider parallelising across CPU cores or offloading to a cloud transcoding API (AWS MediaConvert, Cloudflare Stream) for trunk builds, reserving local FFmpeg for development previews.
Nginx server configuration
Cache-Control headers and MIME type configuration work together in the server block. The critical pairing is Vary: Accept alongside Cache-Control: immutable:
# nginx.conf — next-gen media delivery block
# Ensure mime.types includes: image/avif avif; image/webp webp; video/webm webm;
# (avif and webm are NOT in Nginx's default mime.types before 1.21.x)
location ~* \.(avif|webp|jpg|jpeg|png|gif|mp4|webm)$ {
# immutable: tells browser not to revalidate during max-age window.
# Requires content-hashed URLs (e.g., hero.abc123.avif) so stale assets
# are never served after a deploy.
add_header Cache-Control "public, max-age=31536000, immutable";
# Vary: Accept is mandatory when serving format-negotiated assets from a
# single URL pattern. Without it, a CDN that cached a WebP response will
# incorrectly serve WebP to an AVIF-capable browser (cache poisoning).
add_header Vary "Accept";
gzip off; # AVIF/WebP/MP4/WebM are already compressed; gzip adds CPU with no benefit.
expires 1y; # HTTP/1.0 compatibility alias for max-age=31536000.
try_files $uri =404;
}
Tradeoffs & failure modes
| Failure mode | Cause | Fix |
|---|---|---|
| CDN serves WebP to AVIF-capable browser | Missing Vary: Accept header |
Add Vary: Accept to every location block serving format-negotiated assets |
| AVIF silently rejected by browser | Missing image/avif MIME type in Nginx |
Register image/avif avif; in mime.types or types {} block |
LCP degrades after adding fetchpriority=high to hero |
Multiple elements have fetchpriority=high, starving CSS/font fetches |
Reserve fetchpriority=high for exactly one element per page — the primary LCP candidate |
| CLS spikes on image load | Missing width/height attributes on <img> |
Always declare intrinsic dimensions; use CSS aspect-ratio as belt-and-suspenders |
| CI build time doubles after switching to AVIF | Default avifenc uses --speed 4 or slower |
Set --speed 6 in CI; use --speed 8 for developer previews |
| WebM video blank in Safari 14/15 | Safari did not support VP9/WebM until Safari 16 | Provide <source type="video/mp4"> H.264 fallback; AV1/MP4 works in Safari 17+ |
avifenc --min/--max produce unexpectedly large files |
Quantizer scale is inverted: --min 0 --max 63 means lossless→worst, opposite of a quality slider |
For q≈80 quality, use --min 20 --max 40; the lower the numbers, the better the quality |
Vary: Accept breaks CloudFront caching |
CloudFront by default does not forward or key on Accept |
Configure a Cache Policy that includes Accept in the cache key |
Browser & CDN compatibility matrix
Image format support
| Feature | Safari 14 | Safari 16 | Chrome 85+ | Firefox 93+ | Edge 18+ |
|---|---|---|---|---|---|
| WebP (lossy) | Yes | Yes | Yes | Yes | Yes |
| WebP (lossless) | Yes | Yes | Yes | Yes | Yes |
| AVIF (8-bit) | No | Yes (16.0+) | Yes | Yes | Yes |
| AVIF (10-bit HDR) | No | Yes (16.4+) | Yes | Yes (partial) | Yes |
| JPEG XL | No | No | Experimental | No | No |
<picture> element |
Yes | Yes | Yes | Yes | Yes |
srcset + sizes |
Yes | Yes | Yes | Yes | Yes |
fetchpriority attribute |
Yes (15.4+) | Yes | Yes (102+) | Yes (132+) | Yes (102+) |
Video codec support
| Codec | Safari 14 | Safari 16 | Chrome 85+ | Firefox 93+ | Edge 18+ |
|---|---|---|---|---|---|
| H.264 / MP4 | Yes | Yes | Yes | Yes | Yes |
| VP9 / WebM | No | No | Yes | Yes | Yes |
| H.265 / MP4 | Yes (HW) | Yes (HW) | No | No | Partial |
| AV1 / MP4 | No | No | Yes | Yes | Yes |
| AV1 / MP4 (Safari) | — | — | — | — | Yes (17+) |
| WebM container | No | No | Yes | Yes | Yes |
CDN format negotiation support
| Feature | Cloudflare | Fastly | AWS CloudFront | Nginx (self-hosted) |
|---|---|---|---|---|
Vary: Accept respected in cache key |
Yes (automatic) | Yes (with Vary enabled) |
Requires Cache Policy config | Yes (default) |
| On-the-fly AVIF conversion | Yes (Image Resizing) | No (requires custom VCL) | No (use Lambda@Edge) | No (use libvips/Sharp) |
| Automatic WebP conversion | Yes (Polish) | No | No | No |
Cache-Control: immutable honoured |
Yes | Yes | Yes | Yes |
Edge Accept header forwarding |
Yes | Requires bereq.http.Accept |
Requires Cache Policy | N/A |
Performance measurement and debugging
Track LCP and decode performance with PerformanceObserver. Separating network transfer time from decode time reveals whether poor LCP is a bandwidth problem (large file, slow CDN) or a decode problem (software fallback, oversized image relative to display size):
// Monitor image resource timing to separate transfer vs decode cost.
// PerformanceObserver fires after the resource completes loading.
const observer = new PerformanceObserver((list) => {
for (const entry of list.getEntries()) {
if (entry.initiatorType !== 'img') continue;
// responseEnd - responseStart = transfer time (network).
// duration - (responseEnd - startTime) ≈ decode + compositing time.
const transferMs = (entry.responseEnd - entry.responseStart).toFixed(1);
const totalMs = entry.duration.toFixed(1);
const decodeMs = (entry.duration - (entry.responseEnd - entry.startTime)).toFixed(1);
console.table({
url: entry.name,
totalMs,
transferMs,
decodeMs,
});
}
});
observer.observe({ type: 'resource', buffered: true });
Tradeoff: High decodeMs values (>50 ms on desktop) indicate the image is being software-decoded — either because hardware AVIF decode is unavailable (older Android) or the image is being decoded at a size larger than its display dimensions. Confirm decode path in Chrome by checking chrome://media-internals or enabling Image Decode in the Performance panel.
For server-side validation, use curl -sI to confirm headers without downloading the body:
# Confirm Content-Type and Vary headers for a negotiated image URL.
# -H 'Accept: image/avif,image/webp,*/*' simulates a Chrome request.
curl -sI -H 'Accept: image/avif,image/webp,*/*' \
https://example.com/images/hero.jpg \
| grep -iE 'content-type|vary|cache-control|x-cache'
Expected output:
content-type: image/avif
vary: Accept
cache-control: public, max-age=31536000, immutable
x-cache: HIT
If content-type: image/jpeg appears despite the Accept header, either the server is not performing format negotiation or the CDN returned a cached non-negotiated response.
Accessibility and progressive enhancement
Progressive enhancement requires that every format tier degrades gracefully: AVIF → WebP → JPEG for images; AV1/MP4 → VP9/WebM → H.264/MP4 for video. Accessibility overlays apply at every tier:
- Every informative
<img>requires a descriptivealtattribute. Decorative images (background textures, purely visual dividers) usealt=""and optionallyaria-hidden="true"to remove them from the accessibility tree. - Animated media (GIF replacements, background loops) must respect
@media (prefers-reduced-motion: reduce). Either pause the animation or replace it with a static poster image. - Video content that conveys information must include synchronized captions (WebVTT
<track kind="captions">) and audio descriptions for visual-only content. - SVG diagrams embedded inline require
<title>,<desc>,role="img", andaria-labelon the root element. Interactive SVGs (zoom, toggle) require keyboard event handlers.
Related
- AVIF vs WebP Compression Benchmarks — SSIM/VMAF scoring and file-size data
- MIME Type Configuration for Modern Media Servers — AVIF, WebP, WebM on Nginx and Apache
- Cache-Control Headers for Image and Video Assets — max-age, immutable, Vary
- Understanding Video Codecs: VP9 vs H.265 vs AV1 — hardware decode matrix and licensing
- Lazy Loading, Preloading & Fetch Priorities — IntersectionObserver and fetchpriority patterns
- Responsive Image & Video Delivery — srcset, sizes, and art direction with the picture element