Responsive Image & Video Delivery

Serving the right image or video — at the right resolution, in the right format, at the right moment — is one of the highest-leverage optimisations available to a frontend performance engineer. A single poorly-tuned hero image can push Largest Contentful Paint past the 2.5 s threshold on mid-range mobile devices; an over-sized background video on a 4G connection can consume 15× the bandwidth a compressed AV1 variant would. Responsive media delivery solves both problems through a layered system: browser-side negotiation via srcset and sizes, server-side format selection via Accept headers, pipeline automation through build-time encoding and CDN edge logic, and runtime prioritisation via fetchpriority and IntersectionObserver. Getting all four layers right reduces median LCP by 40–60 % and bandwidth cost by an equivalent margin without touching visual quality.

What this section covers

This section examines every layer of the responsive media stack. The clusters below map to the four main decision surfaces an engineer encounters:

Mastering srcset and sizes for Responsive Layouts covers the browser’s resource-selection algorithm in detail — how it evaluates the srcset descriptor list against the computed sizes hint, when it ignores sizes entirely, and how to author descriptor sets that keep waste below 5 % across the real-world viewport distribution you care about. The companion page on calculating optimal sizes attribute values walks through the maths with concrete examples.

Art Direction with the HTML Picture Element steps beyond resolution switching into format negotiation and crop-level composition control. The <picture> element lets you serve a tightly-cropped portrait variant on narrow viewports and a wide-angle landscape on desktop — without any JavaScript — while the type attribute on each <source> drives the format fallback chain (AVIF → WebP → JPEG).

CSS Container Queries for Dynamic Media Sizing addresses the architectural limitation of viewport-relative sizes: in a component-driven codebase, a card component does not know how wide its container will be at render time. Container queries expose a cqi unit and @container rules that let the component author specify media dimensions relative to the component’s own inline size rather than the viewport, eliminating the need to thread breakpoint knowledge down through prop hierarchies.

Responsive Video Delivery in Next.js and React covers framework-specific patterns: the next/image component’s built-in srcset generation, lazy video initialisation with IntersectionObserver, poster optimisation to prevent layout shift during buffering, and playback-state management for background video loops. The detail pages on implementing responsive video with Video.js and using next/image with custom loader configurations extend this into third-party player integration and CDN-specific loader setup.

Core theory: how browsers negotiate format and resolution

The srcset resource selection algorithm

When a browser parses an <img srcset="...">, it does not simply pick the largest variant that fits. The selection algorithm, defined in the WHATWG HTML Living Standard, runs as follows:

Evaluate sizes to compute the image’s layout width in CSS pixels. If sizes is absent, it defaults to 100vw.
Multiply the layout width by devicePixelRatio to get the required source width in physical pixels.
Walk the srcset descriptor list and select the narrowest candidate that is at least as wide as the required source width. If no candidate is wide enough, the widest one wins.
Cache the chosen URL for the session. A browser will not downgrade to a narrower variant even if the viewport shrinks — the cached URL sticks until a full reload.

The critical implication of step 4 is that srcset only prevents over-fetching on first load. Users who open a page on a narrow viewport and then resize will not automatically re-fetch a wider variant; those who start on a wide viewport retain the wide asset even when they later narrow the window. This asymmetry shapes how you structure your descriptor set: prioritise the narrowest breakpoint rather than over-optimising for the widest.

Format negotiation via Accept headers

The <picture> / <source type="..."> API is client-driven: the browser picks the first <source> whose type MIME type it supports. This is simple and reliable, but it has one drawback — the browser must parse and evaluate the entire <picture> element before making a network request, adding a small parse cost relative to a bare <img>.

The server-driven alternative is Accept header negotiation. The browser sends Accept: image/avif,image/webp,*/* (Chrome, as of early 2025), and an edge worker or Nginx map directive inspects this header and rewrites the image URL to the appropriate format before fetching from origin. This approach serves a single <img src> tag and handles fallback silently, but it requires correct Cache-Control headers — specifically Vary: Accept on every response so CDN caches maintain separate entries per format. Omitting Vary: Accept causes AVIF responses to be served to browsers that requested JPEG, breaking the image for Safari 14.

Chroma subsampling and perceptual quality

AVIF — derived from the AV1 video codec — stores colour using the YCbCr model and typically applies 4:2:0 chroma subsampling: full luma resolution, half horizontal and vertical chroma resolution. For photographic content, 4:2:0 is invisible at normal viewing distances. For text overlays, UI screenshots, and flat-colour graphics, 4:2:0 introduces visible colour fringing. These assets should be encoded with 4:4:4 subsampling (avifenc --yuv 444) at the cost of a 15–25 % larger file.

WebP uses a similar YCbCr scheme with a fixed 4:2:0 subsampling for the lossy path and 4:4:4 for lossless. JPEG offers encoder-level control: mozjpeg -sample 1x1,1x1,1x1 forces 4:4:4. PNG is always full-colour (no subsampling) and is the correct fallback for assets where chroma accuracy is non-negotiable.

Video codec selection mechanics

Codec selection for video follows an identical <source type> pattern. The browser evaluates each <source> in order and picks the first type it can decode in hardware. The negotiation is purely type-based — there is no equivalent of the image sizes hint for video bandwidth. This means the server must pre-generate multiple bitrate variants (typically three) and use a manifest-based adaptive streaming protocol (HLS or MPEG-DASH) if the video is longer than approximately 30 seconds. For short background loops, static multi-source <video> is sufficient.

Understanding VP9, H.265, and AV1 codec characteristics covers encode-time cost, decode hardware support, and compression efficiency in detail. The short version for production decisions: ship AV1 as the first <source> for maximum compression, VP9 as the second for broad Android and desktop Chrome coverage, and H.264 (baseline or main profile, +faststart) as the final fallback for Safari 14 and older Android WebView.

Reference data: format and codec comparison

The table below reflects encode benchmarks on a standard 2 MP photographic test set (Kodak dataset, 24 images) and a 30 s 1080p video clip encoded at target VMAF 93.

Format / Codec	Median file size vs JPEG/H.264 baseline	SSIM at matched quality	Hardware decode support	Encode time (per image / per minute of video)
JPEG (baseline)	100 % (reference)	0.93	Universal	~20 ms / —
WebP lossy (q 80)	−28 %	0.95	CPU-only on older SoCs	~45 ms / —
AVIF (crf 32, 4:2:0)	−48 %	0.95	HW decode: Chrome 85+, Safari 16+, iOS 16+	~380 ms / —
AVIF (crf 32, 4:4:4)	−33 %	0.96	same as above	~460 ms / —
H.264 (crf 23)	100 % (reference)	—	Universal	— / ~2× real-time
VP9 (crf 33)	−34 %	—	HW: Snapdragon 855+, most x86	— / ~6× real-time
AV1 / SVT-AV1 (crf 30)	−47 %	—	HW: Apple M1/M2, newer Android, Chrome 85+ on supporting SoCs	— / ~3× real-time (SVT preset 6)

Key observation: AVIF’s 48 % size saving over JPEG is the largest available for raster images with equivalent perceived quality. The encoding penalty (≈19× slower than JPEG) is paid at build time or in an edge worker, not by the user. AVIF should be the first <source> for all photographic content on sites targeting Chrome 85+ and Safari 16+.

Browser and CDN compatibility matrix

Feature	Safari 14	Safari 16	Chrome 85+	Firefox 93+	Edge 18+	Cloudflare	Fastly	AWS CloudFront
`srcset` w descriptor	Yes	Yes	Yes	Yes	Yes	Pass-through	Pass-through	Pass-through
`<picture>` / `<source type>`	Yes	Yes	Yes	Yes	Yes	Pass-through	Pass-through	Pass-through
AVIF decode	No	Yes	Yes (85+)	Yes (93+)	Yes (94+)	Image Resizing	No native	No native
WebP decode	Yes (14+)	Yes	Yes	Yes	Yes (18+)	Image Resizing	Partial VCL	Lambda@Edge
`fetchpriority` attribute	No	Yes (16.4+)	Yes (101+)	Yes (132+)	Yes (101+)	—	—	—
CSS `@container` queries	No	Yes (16+)	Yes (105+)	Yes (110+)	Yes (105+)	—	—	—
Native `loading="lazy"`	No (14)	Yes (15.4+)	Yes (77+)	Yes (75+)	Yes (79+)	—	—	—
AV1 video decode	No	No	Yes (85+, desktop)	Yes (93+)	Yes (94+)	—	—	—
VP9 video decode	Partial	Yes (16+)	Yes	Yes	Yes	—	—	—

Safari 14 notes: No AVIF, no native lazy loading, no fetchpriority. Serve WebP via <picture> fallback; use a JavaScript IntersectionObserver polyfill for lazy loading; omit fetchpriority (the attribute is safely ignored, but the element will not receive elevated priority in Blink’s resource scheduler on this engine).

Canonical code pattern: production `<picture>` with full fallback chain

<!--
  Production-ready responsive image with format fallback and layout stability.
  Place this pattern for every LCP candidate (hero images, above-fold cards).
-->
<picture>
  <!--
    AVIF source: best compression, ~48 % smaller than JPEG at matched quality.
    srcset uses w-descriptors so the browser can pick the optimal width variant.
    sizes tells the browser the rendered width at each breakpoint BEFORE layout
    is computed — this is the only hint the preload scanner reads.
    Omitting sizes defaults to 100vw, causing the browser to over-fetch
    on multi-column layouts.
  -->
  <source
    type="image/avif"
    srcset="
      /media/hero-400.avif   400w,
      /media/hero-800.avif   800w,
      /media/hero-1200.avif 1200w,
      /media/hero-1600.avif 1600w
    "
    sizes="
      (max-width: 600px)  100vw,
      (max-width: 1200px) 50vw,
      800px
    "
  >
  <!--
    WebP fallback: covers Safari 14+, older Chrome/Firefox.
    Must duplicate the srcset — browsers evaluate <source> elements top-to-bottom
    and stop at the first type match; they do NOT fall back within a type.
  -->
  <source
    type="image/webp"
    srcset="
      /media/hero-400.webp   400w,
      /media/hero-800.webp   800w,
      /media/hero-1200.webp 1200w,
      /media/hero-1600.webp 1600w
    "
    sizes="
      (max-width: 600px)  100vw,
      (max-width: 1200px) 50vw,
      800px
    "
  >
  <!--
    JPEG ultimate fallback: always present, covers IE 11, older Safari, crawlers.
    width + height are MANDATORY — they establish the aspect ratio before the
    image loads, preventing CLS. Calculate as (rendered CSS px) * device ratio.
    loading="eager" + fetchpriority="high" puts this in the highest-priority
    queue in Blink. Use ONLY on the primary LCP element; applying fetchpriority=high
    to multiple images starves CSS and other critical resources.
    decoding="async" off-loads decode to a worker thread so it does not block
    main-thread painting — safe for all images including LCP.
  -->
  <img
    src="/media/hero-1200.jpg"
    alt="Dashboard analytics visualization showing real-time media delivery metrics"
    width="1200"
    height="630"
    loading="eager"
    fetchpriority="high"
    decoding="async"
  >
</picture>

Pipeline integration

Build-time image encoding with Sharp

Sharp wraps libvips and runs in a Node.js CI step. The key pipeline concern is generating a consistent srcset manifest alongside the variant files so the HTML can reference the correct filenames.

// scripts/generate-image-variants.mjs
// Run in CI before the 11ty/Next.js build step.
import sharp from 'sharp';
import { writeFileSync } from 'fs';

const WIDTHS = [400, 800, 1200, 1600];
const FORMATS = [
  {
    ext: 'avif',
    opts: {
      quality: 60,       // AVIF quality scale is 0–100 (higher = better quality, larger file)
      effort: 4,         // Encode effort 0–9; 4 balances CI time vs compression (default is 6)
      chromaSubsampling: '4:2:0'  // Use '4:4:4' for UI screenshots with text overlays
    }
  },
  {
    ext: 'webp',
    opts: {
      quality: 82,       // WebP quality 82 ≈ JPEG quality 85 perceptually
      effort: 4,
      smartSubsample: true  // Preserves colour accuracy near hard edges
    }
  },
  {
    ext: 'jpg',
    opts: {
      quality: 85,
      progressive: true,  // Progressive JPEG decodes top-down, improving perceived speed
      mozjpeg: true        // Enables mozjpeg encoder; ~5-10 % smaller than libjpeg at same quality
    }
  }
];

const srcImages = ['src/images/hero.jpg', 'src/images/team.jpg'];

for (const src of srcImages) {
  const base = src.replace(/^src\/images\//, '').replace(/\.\w+$/, '');
  const pipeline = sharp(src);
  for (const width of WIDTHS) {
    for (const { ext, opts } of FORMATS) {
      await pipeline
        .clone()
        .resize(width)
        .toFormat(ext, opts)
        .toFile(`public/media/${base}-${width}.${ext}`);
    }
  }
}

Build-time video encoding with FFmpeg

The two-pass VP9 + AV1 pipeline below covers the full codec fallback chain. The H.264 variant is always the final fallback — encode it last so its simpler parameter set does not accidentally become the template for the more complex VP9/AV1 commands.

#!/usr/bin/env bash
# encode-video-variants.sh
# Produces three codec variants for a short background loop (< 60 s).
# For longer content, generate multi-bitrate HLS with ffmpeg -hls_segment_type fmp4.

INPUT="$1"
BASE="${INPUT%.*}"

# --- AV1 via SVT-AV1 (libsvtav1) ---
# preset 6: fast enough for CI; preset 4–5 for maximum compression offline.
# crf 30: target quality (lower = higher quality, larger file; range 0–63 for SVT-AV1).
# Note: libsvtav1 does NOT support WebM container — output to MP4.
ffmpeg -i "$INPUT" \
  -c:v libsvtav1 -preset 6 -crf 30 \
  -c:a libopus -b:a 96k \
  -movflags +faststart \   # Move moov atom to file head for progressive play
  "${BASE}_av1.mp4"

# --- VP9 (two-pass for accurate target bitrate) ---
# two-pass VP9 produces ~15 % smaller files than CRF-only for a given quality target.
ffmpeg -i "$INPUT" \
  -c:v libvpx-vp9 -b:v 0 -crf 33 \
  -pass 1 -an -f null /dev/null && \
ffmpeg -i "$INPUT" \
  -c:v libvpx-vp9 -b:v 0 -crf 33 \
  -c:a libopus -b:a 96k \
  -pass 2 \
  "${BASE}_vp9.webm"

# --- H.264 (single-pass CRF, faststart for web) ---
# -preset medium balances encode speed vs compression efficiency.
# -profile:v main covers all modern devices; change to baseline only for legacy Android 4.x.
ffmpeg -i "$INPUT" \
  -c:v libx264 -preset medium -crf 23 \
  -profile:v main -level 4.0 \
  -c:a aac -b:a 128k \
  -movflags +faststart \
  "${BASE}_h264.mp4"

Edge-side format negotiation (Nginx)

For origin servers that prefer server-driven negotiation over <picture> markup, the following Nginx map block routes requests to the correct format based on the Accept header. This requires correctly configured Vary: Accept response headers — without them, a CDN in front will serve a cached AVIF response to a browser that only sent image/jpeg in its Accept header.

# nginx.conf — image format negotiation via Accept header
# Requires: pre-generated AVIF and WebP variants at the same path with .avif / .webp suffix.

http {
  # Map Accept header to best supported format
  # Order matters: check AVIF first, then WebP, then default to JPEG.
  map $http_accept $webp_suffix {
    default        "";
    "~*image/avif" ".avif";  # Chrome 85+, Firefox 93+, Safari 16+
    "~*image/webp" ".webp";  # Safari 14+, Chrome, Firefox, Edge
  }

  server {
    location ~* \.(jpe?g|png)$ {
      # Try the format-suffixed path first; fall back to the original.
      # $uri$webp_suffix resolves to e.g. /media/hero.jpg.avif when AVIF is accepted.
      try_files $uri$webp_suffix $uri =404;

      # CRITICAL: Vary: Accept is mandatory.
      # Without it, a CDN caches the first response (AVIF or WebP) and serves it
      # to ALL clients regardless of their Accept header — breaking older browsers.
      add_header Vary Accept;

      # Long cache lifetime is safe because filenames are content-hashed.
      # Use a shorter max-age (e.g. 86400) if files are not content-hashed.
      add_header Cache-Control "public, max-age=31536000, immutable";
    }
  }
}

Tradeoffs and failure modes

Failure mode	Trigger condition	Mitigation
CDN cache poisoning via `Vary: Accept`	`Vary: Accept` header missing on image responses	Always set `Vary: Accept`; verify with `curl -sI` inspecting the response headers
AVIF encode time blows CI budget	Large image set, effort ≥ 6	Reduce effort to 4; run encoding in a dedicated parallel step; cache build artefacts between CI runs
`fetchpriority=high` starvation	Applied to more than one image per page	Reserve `fetchpriority=high` for exactly the single LCP candidate; all other images use the default
`IntersectionObserver` rootMargin over-eager	Large positive rootMargin on fast scrollers	Start at `'200px'`; measure buffering events in RUM; tighten to `'50px'` if bandwidth is constrained
AVIF served to Safari 14 (no `<picture>`)	Server-side negotiation without `Accept` check	Safari 14 sends `Accept: image/webp,/` — never `image/avif`; the Nginx map above handles this correctly
`loading="lazy"` ignored on Safari 14	Native lazy loading unsupported	Detect support with `'loading' in HTMLImageElement.prototype`; fall back to `IntersectionObserver` — see advanced IntersectionObserver patterns
CLS from missing `width`/`height`	Dimensions omitted from `<img>`	Always set `width` and `height` attributes matching the largest rendered size; let CSS `max-width: 100%` handle responsiveness
`@container` query fallback gap	Container queries unsupported (pre-Chrome 105)	Write a baseline viewport media query first; `@container` will override it in supporting browsers via the cascade
VP9 hardware decode absent on mid-range Android	Devices pre-Snapdragon 855	Include H.264 as the final `<source>` fallback; test on Moto G series in BrowserStack

Debugging and performance telemetry

Identifying the LCP element

Open Chrome DevTools, run a Lighthouse audit, and expand the “Largest Contentful Paint” opportunity. The element path will confirm whether the LCP candidate is your intended hero <img> or something else (a background <div>, a <video> poster, etc.). If the LCP element is a CSS background image, fetchpriority and <link rel="preload"> cannot help directly — the asset only becomes discoverable after CSSOM construction. Convert it to an inline <img> or add an explicit <link rel="preload" as="image" fetchpriority="high"> in <head>. For more on prioritising critical assets, see using fetchpriority to optimise critical media.

Verifying format negotiation

# Confirm AVIF is being served to a Chrome-like Accept header
curl -sI -H "Accept: image/avif,image/webp,*/*" https://example.com/media/hero.jpg \
  | grep -E "content-type|vary|cache-control"

# Expected output (server-side negotiation):
# content-type: image/avif
# vary: Accept
# cache-control: public, max-age=31536000, immutable

# Confirm JPEG fallback for Safari 14
curl -sI -H "Accept: image/webp,*/*" https://example.com/media/hero.jpg \
  | grep content-type
# Expected: content-type: image/jpeg  (or image/webp if WebP fallback is served)

RUM telemetry targets

Instrument your RUM pipeline (web-vitals.js or equivalent) to capture:

LCP — target < 2.5 s on the 75th percentile across mobile connections.
CLS — target < 0.1; any layout shift above 0.05 during image load indicates missing width/height.
INP — < 200 ms; synchronous image decode on the main thread is a common contributor; ensure decoding="async" on all images.
Cache hit rate — monitor CDN hit ratio per MIME type; a drop in AVIF hit rate often signals a Vary: Accept misconfiguration or a deploy that cleared the format-keyed cache entries.

When LCP exceeds 2.5 s at the 75th percentile, the most common causes in order of frequency are: no fetchpriority=high on the LCP image, the LCP image discovered late (CSS background, dynamically injected <img>), AVIF encode quality set too low causing visible decode artefacts that trigger a retry, and origin TTFB exceeding 600 ms due to absent CDN caching.

Mastering srcset and sizes for Responsive Layouts — how the browser’s resource-selection algorithm works and how to author descriptor sets that minimise waste
Art Direction with the HTML Picture Element — crop-level composition control and format negotiation via <picture> / <source type>
CSS Container Queries for Dynamic Media Sizing — component-relative media sizing that decouples breakpoints from the viewport
Responsive Video Delivery in Next.js and React — framework-native patterns for lazy video, poster optimisation, and playback state
Cache-Control headers for image and video assets — max-age, immutable, and the Vary: Accept requirement for format-negotiated responses
Using fetchpriority to optimise critical media — elevating the LCP candidate’s fetch priority without starving other critical resources