Responsive Image & Video Delivery
Serving the right image or video — at the right resolution, in the right format, at the right moment — is one of the highest-leverage optimisations available to a frontend performance engineer. A single poorly-tuned hero image can push Largest Contentful Paint past the 2.5 s threshold on mid-range mobile devices; an over-sized background video on a 4G connection can consume 15× the bandwidth a compressed AV1 variant would. Responsive media delivery solves both problems through a layered system: browser-side negotiation via srcset and sizes, server-side format selection via Accept headers, pipeline automation through build-time encoding and CDN edge logic, and runtime prioritisation via fetchpriority and IntersectionObserver. Getting all four layers right reduces median LCP by 40–60 % and bandwidth cost by an equivalent margin without touching visual quality.
What this section covers
This section examines every layer of the responsive media stack. The clusters below map to the four main decision surfaces an engineer encounters:
Mastering srcset and sizes for Responsive Layouts covers the browser’s resource-selection algorithm in detail — how it evaluates the srcset descriptor list against the computed sizes hint, when it ignores sizes entirely, and how to author descriptor sets that keep waste below 5 % across the real-world viewport distribution you care about. The companion page on calculating optimal sizes attribute values walks through the maths with concrete examples.
Art Direction with the HTML Picture Element steps beyond resolution switching into format negotiation and crop-level composition control. The <picture> element lets you serve a tightly-cropped portrait variant on narrow viewports and a wide-angle landscape on desktop — without any JavaScript — while the type attribute on each <source> drives the format fallback chain (AVIF → WebP → JPEG).
CSS Container Queries for Dynamic Media Sizing addresses the architectural limitation of viewport-relative sizes: in a component-driven codebase, a card component does not know how wide its container will be at render time. Container queries expose a cqi unit and @container rules that let the component author specify media dimensions relative to the component’s own inline size rather than the viewport, eliminating the need to thread breakpoint knowledge down through prop hierarchies.
Responsive Video Delivery in Next.js and React covers framework-specific patterns: the next/image component’s built-in srcset generation, lazy video initialisation with IntersectionObserver, poster optimisation to prevent layout shift during buffering, and playback-state management for background video loops. The detail pages on implementing responsive video with Video.js and using next/image with custom loader configurations extend this into third-party player integration and CDN-specific loader setup.
Core theory: how browsers negotiate format and resolution
The srcset resource selection algorithm
When a browser parses an <img srcset="...">, it does not simply pick the largest variant that fits. The selection algorithm, defined in the WHATWG HTML Living Standard, runs as follows:
- Evaluate
sizesto compute the image’s layout width in CSS pixels. Ifsizesis absent, it defaults to100vw. - Multiply the layout width by
devicePixelRatioto get the required source width in physical pixels. - Walk the
srcsetdescriptor list and select the narrowest candidate that is at least as wide as the required source width. If no candidate is wide enough, the widest one wins. - Cache the chosen URL for the session. A browser will not downgrade to a narrower variant even if the viewport shrinks — the cached URL sticks until a full reload.
The critical implication of step 4 is that srcset only prevents over-fetching on first load. Users who open a page on a narrow viewport and then resize will not automatically re-fetch a wider variant; those who start on a wide viewport retain the wide asset even when they later narrow the window. This asymmetry shapes how you structure your descriptor set: prioritise the narrowest breakpoint rather than over-optimising for the widest.
Format negotiation via Accept headers
The <picture> / <source type="..."> API is client-driven: the browser picks the first <source> whose type MIME type it supports. This is simple and reliable, but it has one drawback — the browser must parse and evaluate the entire <picture> element before making a network request, adding a small parse cost relative to a bare <img>.
The server-driven alternative is Accept header negotiation. The browser sends Accept: image/avif,image/webp,*/* (Chrome, as of early 2025), and an edge worker or Nginx map directive inspects this header and rewrites the image URL to the appropriate format before fetching from origin. This approach serves a single <img src> tag and handles fallback silently, but it requires correct Cache-Control headers — specifically Vary: Accept on every response so CDN caches maintain separate entries per format. Omitting Vary: Accept causes AVIF responses to be served to browsers that requested JPEG, breaking the image for Safari 14.
Chroma subsampling and perceptual quality
AVIF — derived from the AV1 video codec — stores colour using the YCbCr model and typically applies 4:2:0 chroma subsampling: full luma resolution, half horizontal and vertical chroma resolution. For photographic content, 4:2:0 is invisible at normal viewing distances. For text overlays, UI screenshots, and flat-colour graphics, 4:2:0 introduces visible colour fringing. These assets should be encoded with 4:4:4 subsampling (avifenc --yuv 444) at the cost of a 15–25 % larger file.
WebP uses a similar YCbCr scheme with a fixed 4:2:0 subsampling for the lossy path and 4:4:4 for lossless. JPEG offers encoder-level control: mozjpeg -sample 1x1,1x1,1x1 forces 4:4:4. PNG is always full-colour (no subsampling) and is the correct fallback for assets where chroma accuracy is non-negotiable.
Video codec selection mechanics
Codec selection for video follows an identical <source type> pattern. The browser evaluates each <source> in order and picks the first type it can decode in hardware. The negotiation is purely type-based — there is no equivalent of the image sizes hint for video bandwidth. This means the server must pre-generate multiple bitrate variants (typically three) and use a manifest-based adaptive streaming protocol (HLS or MPEG-DASH) if the video is longer than approximately 30 seconds. For short background loops, static multi-source <video> is sufficient.
Understanding VP9, H.265, and AV1 codec characteristics covers encode-time cost, decode hardware support, and compression efficiency in detail. The short version for production decisions: ship AV1 as the first <source> for maximum compression, VP9 as the second for broad Android and desktop Chrome coverage, and H.264 (baseline or main profile, +faststart) as the final fallback for Safari 14 and older Android WebView.
Reference data: format and codec comparison
The table below reflects encode benchmarks on a standard 2 MP photographic test set (Kodak dataset, 24 images) and a 30 s 1080p video clip encoded at target VMAF 93.
| Format / Codec | Median file size vs JPEG/H.264 baseline | SSIM at matched quality | Hardware decode support | Encode time (per image / per minute of video) |
|---|---|---|---|---|
| JPEG (baseline) | 100 % (reference) | 0.93 | Universal | ~20 ms / — |
| WebP lossy (q 80) | −28 % | 0.95 | CPU-only on older SoCs | ~45 ms / — |
| AVIF (crf 32, 4:2:0) | −48 % | 0.95 | HW decode: Chrome 85+, Safari 16+, iOS 16+ | ~380 ms / — |
| AVIF (crf 32, 4:4:4) | −33 % | 0.96 | same as above | ~460 ms / — |
| H.264 (crf 23) | 100 % (reference) | — | Universal | — / ~2× real-time |
| VP9 (crf 33) | −34 % | — | HW: Snapdragon 855+, most x86 | — / ~6× real-time |
| AV1 / SVT-AV1 (crf 30) | −47 % | — | HW: Apple M1/M2, newer Android, Chrome 85+ on supporting SoCs | — / ~3× real-time (SVT preset 6) |
Key observation: AVIF’s 48 % size saving over JPEG is the largest available for raster images with equivalent perceived quality. The encoding penalty (≈19× slower than JPEG) is paid at build time or in an edge worker, not by the user. AVIF should be the first <source> for all photographic content on sites targeting Chrome 85+ and Safari 16+.
Browser and CDN compatibility matrix
| Feature | Safari 14 | Safari 16 | Chrome 85+ | Firefox 93+ | Edge 18+ | Cloudflare | Fastly | AWS CloudFront |
|---|---|---|---|---|---|---|---|---|
srcset w descriptor |
Yes | Yes | Yes | Yes | Yes | Pass-through | Pass-through | Pass-through |
<picture> / <source type> |
Yes | Yes | Yes | Yes | Yes | Pass-through | Pass-through | Pass-through |
| AVIF decode | No | Yes | Yes (85+) | Yes (93+) | Yes (94+) | Image Resizing | No native | No native |
| WebP decode | Yes (14+) | Yes | Yes | Yes | Yes (18+) | Image Resizing | Partial VCL | Lambda@Edge |
fetchpriority attribute |
No | Yes (16.4+) | Yes (101+) | Yes (132+) | Yes (101+) | — | — | — |
CSS @container queries |
No | Yes (16+) | Yes (105+) | Yes (110+) | Yes (105+) | — | — | — |
Native loading="lazy" |
No (14) | Yes (15.4+) | Yes (77+) | Yes (75+) | Yes (79+) | — | — | — |
| AV1 video decode | No | No | Yes (85+, desktop) | Yes (93+) | Yes (94+) | — | — | — |
| VP9 video decode | Partial | Yes (16+) | Yes | Yes | Yes | — | — | — |
Safari 14 notes: No AVIF, no native lazy loading, no fetchpriority. Serve WebP via <picture> fallback; use a JavaScript IntersectionObserver polyfill for lazy loading; omit fetchpriority (the attribute is safely ignored, but the element will not receive elevated priority in Blink’s resource scheduler on this engine).
Canonical code pattern: production <picture> with full fallback chain
<!--
Production-ready responsive image with format fallback and layout stability.
Place this pattern for every LCP candidate (hero images, above-fold cards).
-->
<picture>
<!--
AVIF source: best compression, ~48 % smaller than JPEG at matched quality.
srcset uses w-descriptors so the browser can pick the optimal width variant.
sizes tells the browser the rendered width at each breakpoint BEFORE layout
is computed — this is the only hint the preload scanner reads.
Omitting sizes defaults to 100vw, causing the browser to over-fetch
on multi-column layouts.
-->
<source
type="image/avif"
srcset="
/media/hero-400.avif 400w,
/media/hero-800.avif 800w,
/media/hero-1200.avif 1200w,
/media/hero-1600.avif 1600w
"
sizes="
(max-width: 600px) 100vw,
(max-width: 1200px) 50vw,
800px
"
>
<!--
WebP fallback: covers Safari 14+, older Chrome/Firefox.
Must duplicate the srcset — browsers evaluate <source> elements top-to-bottom
and stop at the first type match; they do NOT fall back within a type.
-->
<source
type="image/webp"
srcset="
/media/hero-400.webp 400w,
/media/hero-800.webp 800w,
/media/hero-1200.webp 1200w,
/media/hero-1600.webp 1600w
"
sizes="
(max-width: 600px) 100vw,
(max-width: 1200px) 50vw,
800px
"
>
<!--
JPEG ultimate fallback: always present, covers IE 11, older Safari, crawlers.
width + height are MANDATORY — they establish the aspect ratio before the
image loads, preventing CLS. Calculate as (rendered CSS px) * device ratio.
loading="eager" + fetchpriority="high" puts this in the highest-priority
queue in Blink. Use ONLY on the primary LCP element; applying fetchpriority=high
to multiple images starves CSS and other critical resources.
decoding="async" off-loads decode to a worker thread so it does not block
main-thread painting — safe for all images including LCP.
-->
<img
src="/media/hero-1200.jpg"
alt="Dashboard analytics visualization showing real-time media delivery metrics"
width="1200"
height="630"
loading="eager"
fetchpriority="high"
decoding="async"
>
</picture>
Pipeline integration
Build-time image encoding with Sharp
Sharp wraps libvips and runs in a Node.js CI step. The key pipeline concern is generating a consistent srcset manifest alongside the variant files so the HTML can reference the correct filenames.
// scripts/generate-image-variants.mjs
// Run in CI before the 11ty/Next.js build step.
import sharp from 'sharp';
import { writeFileSync } from 'fs';
const WIDTHS = [400, 800, 1200, 1600];
const FORMATS = [
{
ext: 'avif',
opts: {
quality: 60, // AVIF quality scale is 0–100 (higher = better quality, larger file)
effort: 4, // Encode effort 0–9; 4 balances CI time vs compression (default is 6)
chromaSubsampling: '4:2:0' // Use '4:4:4' for UI screenshots with text overlays
}
},
{
ext: 'webp',
opts: {
quality: 82, // WebP quality 82 ≈ JPEG quality 85 perceptually
effort: 4,
smartSubsample: true // Preserves colour accuracy near hard edges
}
},
{
ext: 'jpg',
opts: {
quality: 85,
progressive: true, // Progressive JPEG decodes top-down, improving perceived speed
mozjpeg: true // Enables mozjpeg encoder; ~5-10 % smaller than libjpeg at same quality
}
}
];
const srcImages = ['src/images/hero.jpg', 'src/images/team.jpg'];
for (const src of srcImages) {
const base = src.replace(/^src\/images\//, '').replace(/\.\w+$/, '');
const pipeline = sharp(src);
for (const width of WIDTHS) {
for (const { ext, opts } of FORMATS) {
await pipeline
.clone()
.resize(width)
.toFormat(ext, opts)
.toFile(`public/media/${base}-${width}.${ext}`);
}
}
}
Build-time video encoding with FFmpeg
The two-pass VP9 + AV1 pipeline below covers the full codec fallback chain. The H.264 variant is always the final fallback — encode it last so its simpler parameter set does not accidentally become the template for the more complex VP9/AV1 commands.
#!/usr/bin/env bash
# encode-video-variants.sh
# Produces three codec variants for a short background loop (< 60 s).
# For longer content, generate multi-bitrate HLS with ffmpeg -hls_segment_type fmp4.
INPUT="$1"
BASE="${INPUT%.*}"
# --- AV1 via SVT-AV1 (libsvtav1) ---
# preset 6: fast enough for CI; preset 4–5 for maximum compression offline.
# crf 30: target quality (lower = higher quality, larger file; range 0–63 for SVT-AV1).
# Note: libsvtav1 does NOT support WebM container — output to MP4.
ffmpeg -i "$INPUT" \
-c:v libsvtav1 -preset 6 -crf 30 \
-c:a libopus -b:a 96k \
-movflags +faststart \ # Move moov atom to file head for progressive play
"${BASE}_av1.mp4"
# --- VP9 (two-pass for accurate target bitrate) ---
# two-pass VP9 produces ~15 % smaller files than CRF-only for a given quality target.
ffmpeg -i "$INPUT" \
-c:v libvpx-vp9 -b:v 0 -crf 33 \
-pass 1 -an -f null /dev/null && \
ffmpeg -i "$INPUT" \
-c:v libvpx-vp9 -b:v 0 -crf 33 \
-c:a libopus -b:a 96k \
-pass 2 \
"${BASE}_vp9.webm"
# --- H.264 (single-pass CRF, faststart for web) ---
# -preset medium balances encode speed vs compression efficiency.
# -profile:v main covers all modern devices; change to baseline only for legacy Android 4.x.
ffmpeg -i "$INPUT" \
-c:v libx264 -preset medium -crf 23 \
-profile:v main -level 4.0 \
-c:a aac -b:a 128k \
-movflags +faststart \
"${BASE}_h264.mp4"
Edge-side format negotiation (Nginx)
For origin servers that prefer server-driven negotiation over <picture> markup, the following Nginx map block routes requests to the correct format based on the Accept header. This requires correctly configured Vary: Accept response headers — without them, a CDN in front will serve a cached AVIF response to a browser that only sent image/jpeg in its Accept header.
# nginx.conf — image format negotiation via Accept header
# Requires: pre-generated AVIF and WebP variants at the same path with .avif / .webp suffix.
http {
# Map Accept header to best supported format
# Order matters: check AVIF first, then WebP, then default to JPEG.
map $http_accept $webp_suffix {
default "";
"~*image/avif" ".avif"; # Chrome 85+, Firefox 93+, Safari 16+
"~*image/webp" ".webp"; # Safari 14+, Chrome, Firefox, Edge
}
server {
location ~* \.(jpe?g|png)$ {
# Try the format-suffixed path first; fall back to the original.
# $uri$webp_suffix resolves to e.g. /media/hero.jpg.avif when AVIF is accepted.
try_files $uri$webp_suffix $uri =404;
# CRITICAL: Vary: Accept is mandatory.
# Without it, a CDN caches the first response (AVIF or WebP) and serves it
# to ALL clients regardless of their Accept header — breaking older browsers.
add_header Vary Accept;
# Long cache lifetime is safe because filenames are content-hashed.
# Use a shorter max-age (e.g. 86400) if files are not content-hashed.
add_header Cache-Control "public, max-age=31536000, immutable";
}
}
}
Tradeoffs and failure modes
| Failure mode | Trigger condition | Mitigation |
|---|---|---|
CDN cache poisoning via Vary: Accept |
Vary: Accept header missing on image responses |
Always set Vary: Accept; verify with curl -sI inspecting the response headers |
| AVIF encode time blows CI budget | Large image set, effort ≥ 6 | Reduce effort to 4; run encoding in a dedicated parallel step; cache build artefacts between CI runs |
fetchpriority=high starvation |
Applied to more than one image per page | Reserve fetchpriority=high for exactly the single LCP candidate; all other images use the default |
IntersectionObserver rootMargin over-eager |
Large positive rootMargin on fast scrollers | Start at '200px'; measure buffering events in RUM; tighten to '50px' if bandwidth is constrained |
AVIF served to Safari 14 (no <picture>) |
Server-side negotiation without Accept check |
Safari 14 sends Accept: image/webp,*/* — never image/avif; the Nginx map above handles this correctly |
loading="lazy" ignored on Safari 14 |
Native lazy loading unsupported | Detect support with 'loading' in HTMLImageElement.prototype; fall back to IntersectionObserver — see advanced IntersectionObserver patterns |
CLS from missing width/height |
Dimensions omitted from <img> |
Always set width and height attributes matching the largest rendered size; let CSS max-width: 100% handle responsiveness |
@container query fallback gap |
Container queries unsupported (pre-Chrome 105) | Write a baseline viewport media query first; @container will override it in supporting browsers via the cascade |
| VP9 hardware decode absent on mid-range Android | Devices pre-Snapdragon 855 | Include H.264 as the final <source> fallback; test on Moto G series in BrowserStack |
Debugging and performance telemetry
Identifying the LCP element
Open Chrome DevTools, run a Lighthouse audit, and expand the “Largest Contentful Paint” opportunity. The element path will confirm whether the LCP candidate is your intended hero <img> or something else (a background <div>, a <video> poster, etc.). If the LCP element is a CSS background image, fetchpriority and <link rel="preload"> cannot help directly — the asset only becomes discoverable after CSSOM construction. Convert it to an inline <img> or add an explicit <link rel="preload" as="image" fetchpriority="high"> in <head>. For more on prioritising critical assets, see using fetchpriority to optimise critical media.
Verifying format negotiation
# Confirm AVIF is being served to a Chrome-like Accept header
curl -sI -H "Accept: image/avif,image/webp,*/*" https://example.com/media/hero.jpg \
| grep -E "content-type|vary|cache-control"
# Expected output (server-side negotiation):
# content-type: image/avif
# vary: Accept
# cache-control: public, max-age=31536000, immutable
# Confirm JPEG fallback for Safari 14
curl -sI -H "Accept: image/webp,*/*" https://example.com/media/hero.jpg \
| grep content-type
# Expected: content-type: image/jpeg (or image/webp if WebP fallback is served)
RUM telemetry targets
Instrument your RUM pipeline (web-vitals.js or equivalent) to capture:
- LCP — target
< 2.5 son the 75th percentile across mobile connections. - CLS — target
< 0.1; any layout shift above 0.05 during image load indicates missingwidth/height. - INP —
< 200 ms; synchronous image decode on the main thread is a common contributor; ensuredecoding="async"on all images. - Cache hit rate — monitor CDN hit ratio per MIME type; a drop in AVIF hit rate often signals a
Vary: Acceptmisconfiguration or a deploy that cleared the format-keyed cache entries.
When LCP exceeds 2.5 s at the 75th percentile, the most common causes in order of frequency are: no fetchpriority=high on the LCP image, the LCP image discovered late (CSS background, dynamically injected <img>), AVIF encode quality set too low causing visible decode artefacts that trigger a retry, and origin TTFB exceeding 600 ms due to absent CDN caching.
Related
- Mastering srcset and sizes for Responsive Layouts — how the browser’s resource-selection algorithm works and how to author descriptor sets that minimise waste
- Art Direction with the HTML Picture Element — crop-level composition control and format negotiation via
<picture>/<source type> - CSS Container Queries for Dynamic Media Sizing — component-relative media sizing that decouples breakpoints from the viewport
- Responsive Video Delivery in Next.js and React — framework-native patterns for lazy video, poster optimisation, and playback state
- Cache-Control headers for image and video assets —
max-age,immutable, and theVary: Acceptrequirement for format-negotiated responses - Using fetchpriority to optimise critical media — elevating the LCP candidate’s fetch priority without starving other critical resources