Website Spec

The checklist

Every spec item, flat and tickable. Use as a self-audit. Print-friendly.

Foundations

The HTML, head, and document basics every page needs.

  • The HTML doctype Required

    Every HTML document must start with <!doctype html> as its first line. This opts the browser into standards mode; without it, you get quirks mode and broken layout.

  • Set a valid BCP 47 language tag on the <html> element so screen readers, translators, search engines, and browsers know what language the page is in.

  • <meta charset> Required

    Declare UTF-8 as the document character encoding in the first 1024 bytes of the HTML, so browsers parse text correctly before they hit any non-ASCII content.

  • <meta viewport> Required

    Tell mobile browsers to render the page at the device's actual width instead of pretending to be a 980-pixel desktop. One line, and never disable user scaling.

  • Every HTML document must have exactly one non-empty <title> element inside <head>. It is used by browsers, search engines, screen readers, social previews, and AI agents.

  • A short, unique summary of the page used by search engines and social platforms as a snippet. Google may rewrite it, but a good one is rewritten less often.

  • Declare the preferred URL for a page so search engines and crawlers consolidate ranking signals on one address, even when several URLs serve the same content.

  • Ship an SVG favicon, an ICO fallback at /favicon.ico, an apple-touch-icon, and a maskable PWA icon. Five files cover every browser and home-screen surface.

  • Tints the browser chrome and OS surfaces to match your brand. Use the media attribute to ship one colour for light mode and another for dark mode.

  • Tells the browser which colour schemes your page is designed for. Prevents the white flash that dark-mode users see before your CSS loads, and lets the browser style scrollbars, form controls, and the page background to match.

  • Open Graph protocol Recommended

    Open Graph tags control how pages look when shared on social platforms and chat apps. Set og:title, og:description, og:image, og:url, and og:type on every page.

  • If your site publishes a feed — RSS, Atom, or JSON Feed — announce it in <head> with <link rel="alternate">. Feed readers, agents, and browsers discover it without guessing the URL.

  • Feed content hygiene Recommended

    If you publish a feed, ship it well-formed. Identify the feed inside itself with atom:link rel="self", give every item a stable guid, declare an update cadence with the Syndication module, and validate before deploy.

  • Popover API Recommended

    Replace ARIA-puzzled JavaScript modals, menus, and tooltips with a native top-layer primitive that the browser opens, closes, and accessibility-wires for you.

SEO

Search visibility — robots.txt, sitemaps, canonicals, structured data.

  • robots.txt Recommended

    A plain-text file at the site root that tells crawlers which paths they may or may not fetch. Standardised in RFC 9309 and supported by every major search engine.

  • XML sitemaps Recommended

    An XML file listing the canonical URLs of a site, with optional metadata about when each was last changed. The fastest way to tell a search engine what exists.

  • Sitemap index files Recommended

    A sitemap of sitemaps. Used when a site has more than 50,000 URLs or wants to split sitemaps by content type for cleaner reporting.

  • Optional XML extensions that add image and video metadata to sitemap entries. Useful when media is loaded by JavaScript or hosted on a CDN that crawlers cannot reach by following links.

  • URL structure Recommended

    URLs are the most stable identifier on the web. Keep them lowercase, hyphenated, descriptive, and shallow. Treat them as a public API for your content.

  • HTTP redirects send a client from one URL to another. Use 301 or 308 for permanent moves, 302 or 307 for temporary ones, and never chain more than necessary.

  • Soft 404s Avoid

    A page that looks like a 'not found' message to a user but returns 200 OK to a crawler. Search engines treat soft 404s as a quality problem and often refuse to index them.

  • Every page must have an explicit, correct indexing policy — either implicit (default index, follow) on public pages, or an explicit noindex / X-Robots-Tag on staging, admin, thin, or private content. Get this wrong and you either disappear from search or expose what you didn't mean to.

  • Headings describe the sections of a page. They must form a nested outline, never be used for visual styling alone, and never skip levels.

  • Internal linking Recommended

    Links from one page on a site to another. The strongest signal you control for telling crawlers and AI agents what a page is about and how important it is.

  • Machine-readable annotations that describe the content of a page using the schema.org vocabulary. JSON-LD is the format search engines and AI agents expect.

  • Breadcrumbs Recommended

    A short trail showing the page's position in the site hierarchy. Visible in the UI for users, marked up as BreadcrumbList JSON-LD for search engines.

  • IndexNow Optional

    An open protocol for telling participating search engines that a URL has changed. One HTTP request pushes Bing, Yandex, Naver, and Seznam to recrawl — Google does not participate.

Accessibility

WCAG-aligned rules so people of all abilities can use the site.

  • Colour contrast Required

    Text and meaningful non-text elements must have enough contrast against their background so people with low vision and people in harsh light can read them.

  • Image alt text Required

    Every <img> element must have an alt attribute. The value describes the image's purpose to screen readers, search engines, and anyone whose image fails to load.

  • Form labels Required

    Every form control needs a programmatically associated label. A placeholder is not a label, and an unlabelled input is unusable for screen-reader and voice-control users.

  • Every interactive element on the page must be reachable and operable with a keyboard alone, in a logical order, with no traps that hold focus.

  • Whenever a control receives keyboard focus, the page must show a clear, high-contrast indicator. Removing focus outlines without a replacement is a top accessibility failure.

  • Skip links Recommended

    A 'skip to main content' link as the first focusable element lets keyboard and screen-reader users jump past repeated navigation on every page.

  • Use the right HTML element for the job. Landmarks like <header>, <nav>, <main>, and <footer> let assistive technologies announce structure and skip between regions.

  • ARIA can make custom widgets accessible, but the first rule of ARIA is don't use ARIA. Reach for a native HTML element first; add ARIA only when nothing native fits.

  • Every link's text must describe where it goes. 'Click here' and 'read more' fail screen-reader users who scan a page by jumping from link to link.

  • A link or button with no accessible name is invisible to screen readers and unreachable for voice control. Icon-only controls without a label are the usual culprit.

  • When a form submission fails, errors must be identified in text, associated with the input that caused them, and announced to assistive technology.

  • Set the page's primary language on <html lang> and mark any inline content in a different language with its own lang attribute, so screen readers pronounce it correctly.

  • Reduced motion Required

    Respect the user's `prefers-reduced-motion` setting. Decorative animation, parallax, and autoplay can trigger vestibular distress, migraines, and seizures.

  • Third-party JavaScript widgets that claim to make a site WCAG-compliant at runtime. They do not work, often harm screen-reader users, and attract lawsuits.

  • Video needs synchronised captions, audio-only content needs a transcript, and visuals that carry meaning need audio description. Auto-captions alone are not enough.

  • Tabular data must use real <table> markup with a caption, header cells, and scope attributes so screen readers can announce row and column relationships.

  • Interactive controls must be large enough to tap or click reliably. WCAG 2.2 sets a 24×24 CSS px minimum, with 44×44 CSS px as the enhanced target.

  • Hidden until found Recommended

    Use hidden="until-found" (or content-visibility: hidden) for collapsible content so that browser find-in-page, assistive tech, and search engines can still reach the text and auto-expand it.

  • Prefer native HTML interactive elements — <button>, <a>, <details>/<summary>, <dialog> — over divs with click handlers. You get keyboard support, focus management, and assistive-tech semantics for free.

  • Use `:has()` together with `:user-invalid`, `:user-valid`, `:placeholder-shown` and `:focus-within` to express form and component state in CSS, removing the JavaScript class-toggling pattern and the race conditions it brings.

Security

Headers, transport, and policies that keep visitors safe.

  • HTTPS and TLS Required

    Serve every page over HTTPS using TLS 1.2 or 1.3, redirect plain HTTP to HTTPS, and disable obsolete SSL and early TLS versions on every host you control.

  • HSTS tells browsers to use HTTPS for your domain only, for a long time. Add max-age, includeSubDomains, and preload — but understand it is an irreversible commitment.

  • A CSP tells browsers which sources of script, style, image, and frame content to trust. A good policy stops most XSS and data-exfiltration attacks dead.

  • A standard text file at /.well-known/security.txt tells security researchers how to report vulnerabilities. It is cheap to publish and dramatically lowers the bar for responsible disclosure.

  • The nosniff header stops browsers from guessing a response's content type. It blocks a class of attacks where a benign-looking file is interpreted as script or stylesheet.

  • Tell browsers who is allowed to embed your pages in an iframe. Use CSP frame-ancestors. X-Frame-Options is the legacy fallback.

  • Referrer-Policy Recommended

    Referrer-Policy controls how much URL information your site leaks when users follow a link or load a subresource. strict-origin-when-cross-origin is the sensible default.

  • Permissions-Policy Recommended

    Permissions-Policy lets you turn off powerful browser features — camera, microphone, geolocation, payment, USB — for your own pages and for any iframes you embed.

  • SRI adds a cryptographic hash to every third-party script and stylesheet so the browser refuses to run modified files. Essential for any external JS or CSS you depend on.

  • Every cookie should be Secure, HttpOnly where possible, and have an explicit SameSite. Use __Host- and __Secure- prefixes for session cookies.

  • DNS CAA records Recommended

    A CAA record tells certificate authorities which of them are allowed to issue certificates for your domain. Cheap to add, blocks a class of mis-issuance attacks.

  • DNSSEC Optional

    DNSSEC cryptographically signs DNS records so resolvers can verify they have not been tampered with. Strong defence in depth, but only with full registrar and registry support.

Well-Known URIs

Standard, agreed-upon paths under /.well-known/.

  • Well-known URIs Recommended

    The /.well-known/ path prefix is a standardised place to publish site-level metadata. RFC 8615 defines it; IANA keeps the registry of allowed names.

  • A standard redirect endpoint that points password managers and users at your real change-password page. Only applicable if the site has user accounts — sites without logins have nothing to point at and should not implement it.

  • A JSON discovery document that describes an OpenID Connect provider's endpoints and capabilities. Only required if you are an OIDC identity provider.

  • RFC 9727 publishes a machine-readable index of the APIs and resources a host exposes. Served as a Linkset (RFC 9264) JSON document, discoverable via the api-catalog link relation.

  • WebFinger (RFC 7033) resolves an account identifier such as acct:[email protected] to a set of links. The Fediverse uses it to discover ActivityPub actors.

  • A JSON file that tells iOS, iPadOS and macOS which Apple apps may handle which URLs on your domain. Required for Universal Links and several other Apple features.

  • Android's Digital Asset Links file proves that an Android app and a web domain are owned by the same entity. It powers App Links and Smart Lock for Passwords.

  • A discovery URI for federated platforms. It returns links to NodeInfo documents that describe the software, version and basic statistics of a server.

  • A JSON file that tells private prefetch proxies — most notably Chrome's — whether to send prefetch traffic to your origin, and at what fraction. Optional opt-out / throttle mechanism, provisionally registered with IANA.

Agent Readiness

Things that make a site legible to AI agents and crawlers.

  • Agent readiness Recommended

    Agent readiness is the set of choices that make a site legible to AI agents and LLMs: stable URLs, structured data, clean semantics, robots controls, and machine-readable endpoints.

  • /llms.txt Recommended

    A proposed markdown file at the site root that gives LLMs a curated index of your most important content. Emerging convention, not a ratified standard.

  • /llms-full.txt Optional

    An extended companion to /llms.txt that concatenates the full markdown content of your key pages into a single file. Useful for small sites, costly for large ones.

  • Expose every documentation page's raw Markdown source at a predictable URL — via a .md suffix on the canonical URL, content negotiation, or both. Agents pull source instead of parsing HTML.

  • Major AI vendors publish named user-agents for their crawlers. Setting an explicit allow or disallow per agent is the clearest way to control how your content is used.

  • Add Content-Signal directives to robots.txt to declare whether AI crawlers may search, ingest, or train on your content. An emerging IETF AI Preferences / IAB Tech Lab proposal that some validators already check.

  • Web Bot Auth lets a bot prove who it is by signing each HTTP request with a key it controls. Sites can then allow or block specific bots without IP allow-lists, user-agent strings, or guesswork. Built on RFC 9421 HTTP Message Signatures.

  • Stable URLs Required

    URLs are public contracts. Once published, they should keep working. Breaking them invalidates citations, bookmarks, links, and agent caches — and is almost always avoidable.

  • JSON-LD with schema.org types gives agents typed facts about your page. It is the same markup search engines use, and agents lean on it just as heavily.

  • Offer JSON, RSS, or plain markdown endpoints alongside HTML where it makes sense. Agents and feed readers prefer typed data over scraped HTML.

  • Use the HTTP Link header to advertise machine-readable resources — llms.txt, sitemap, api-catalog, RSS — directly in the response. Agents that never parse your HTML can still find what they need.

  • The Model Context Protocol is an emerging way for sites to expose queryable tools to agents over JSON-RPC. Relevant whenever your content has structure worth filtering — even for a static reference site like this one.

  • A2A agent cards Optional

    The Agent-to-Agent (A2A) protocol lets an autonomous agent find another autonomous agent and call it over JSON-RPC. Discovery hinges on a single well-known file: `/.well-known/agent-card.json`. Relevant whenever your service exposes agentic behaviour another agent might want to delegate to.

  • A well-known URI that lists Agent Skills — short, scoped instructions an AI agent can load to work better with your site. Emerging convention via a Cloudflare-led RFC; still draft, still cheap to ship.

  • Publish SVCB/HTTPS records under _agents.example.com so agents can discover your services from DNS, before any HTTP round-trip. Pair with DNSSEC so the answer is authenticated.

  • NLWeb is an emerging convention for exposing a site as a conversational AI endpoint. A site advertises an `/ask`-style endpoint via a `rel="nlweb"` link and serves an MCP-compatible JSON-RPC interface that agents can query in natural language.

  • WebMCP lets a page register tools that an in-browser AI agent can call directly, using a `navigator.modelContext` JavaScript API. It turns a site into an agent surface without server-side MCP plumbing.

  • A convention this site proposes — no external standard exists yet. `/schemamap.xml` indexes one JSON-LD endpoint per resource so agents fetch the structured-data graph directly instead of extracting it from HTML.

Performance

Core Web Vitals, caching, images, fonts, network behaviour.

  • Core Web Vitals measure loading, responsiveness, and visual stability. Hit LCP ≤ 2.5s, INP ≤ 200ms, and CLS ≤ 0.1 at the 75th percentile of real users.

  • Serve images in modern formats (WebP, AVIF), at the right size for the viewport, with explicit dimensions. Images are the largest payload on most pages.

  • Native lazy loading defers off-screen images, iframes, and (recently) video until the user scrolls near them. Use loading="lazy" — but never on the LCP element.

  • Resource hints let you tell the browser what is coming. Preload the LCP image and critical fonts, preconnect to third-party origins, prefetch the next navigation.

  • Cache-Control tells browsers and CDNs how long to keep a response. Use immutable + max-age=31536000 for fingerprinted assets and short or no-cache for HTML.

  • The `No-Vary-Search` response header tells browsers and caches that some URL query parameters (tracking, UTM, sort order) do not change the response. The cached entry for the canonical URL is reused for variants — fewer fetches, better prefetch hits, less duplicate work.

  • Compress text responses with brotli where supported, gzip everywhere else. zstd is emerging. Don't compress already-compressed media.

  • Web font loading Recommended

    Self-host WOFF2 fonts, subset them, preload the critical face, and use font-display: swap so text is readable while the font is still loading.

  • Inline the CSS needed for above-the-fold content and defer the rest. Render-blocking resources in <head> are the single biggest cause of slow first paint.

  • Choose the right script-loading attribute for every <script>: defer for app code, async for independent third-party, type=module for modern code. Bare <script> in <head> is always wrong.

  • HTTP/2 and HTTP/3 Recommended

    Serve over HTTP/2 at minimum and HTTP/3 where you can. Multiplexing eliminates head-of-line blocking; QUIC removes TCP handshake delays.

  • Speculation Rules Recommended

    Tell the browser which links to prefetch or prerender before the user clicks. Done well, navigations feel instant; done carelessly, you burn bandwidth on pages nobody visits.

  • Five resource hints — dns-prefetch, preconnect, preload, modulepreload, prefetch — cover every stage of the request lifecycle. Pick the right one for the job.

  • View Transitions Recommended

    Animate between states (same-document) or between pages (cross-document) with a single CSS opt-in. Replaces ad-hoc SPA animation libraries with a platform primitive.

  • Keep pages BFCache-eligible so back/forward navigation restores them instantly from memory, with no reload, no hydration, and no repaint.

  • Use `content-visibility` with `contain-intrinsic-size` to skip layout and paint for off-screen content, and Intersection Observer to drive lazy behaviour, instead of scroll and resize listeners.

  • CSS containment Optional

    Use `contain: layout paint style` (or the `contain: content` shorthand) to tell the browser that an element's internals cannot affect the rest of the page, so reflow and repaint stay isolated to that subtree.

  • Drive CSS animations from scroll position or element visibility with `scroll-timeline` and `view-timeline`, replacing JS scroll-listener libraries with compositor-thread animation.

  • Scrollbar gutter Recommended

    Use scrollbar-gutter: stable to reserve scrollbar space and stop horizontal layout shift between pages or states that overflow vs. don't.

Privacy

Consent, signals, and respecting visitor choice.

  • Privacy policy Required

    A privacy policy tells visitors what personal data you collect, why, on what legal basis, who you share it with, how long you keep it, and what rights they have.

  • Cookie consent Required

    In the EU and UK, non-essential cookies and similar storage require freely given, informed, specific, and unambiguous opt-in consent before they are set.

  • Global Privacy Control is a browser-level signal that tells websites the user opts out of the sale or sharing of their personal data. California and Colorado require sites to honour it.

  • Every script loaded from another domain can read cookies, see the URL, and exfiltrate data from your page. Audit them, justify them, and lock them down.

  • You can measure traffic without surveilling visitors. Aggregate, cookieless, EU-hosted analytics tools answer most product questions without the consent and transfer problems of ad-tech analytics.

  • Data minimisation Recommended

    Collect only the personal data you actually need for a specific purpose, keep it only as long as you need it, and redact it from anywhere it leaks unnecessarily.

Resilience

Graceful failure — error pages, offline, redirects.

  • Custom error pages must return the correct HTTP status code, explain what went wrong in plain language, and offer the user a way forward without leaking implementation details.

  • When the site is intentionally offline, return HTTP 503 with a Retry-After header and a page that tells users what is happening and when to come back.

  • A service worker can serve a cached offline fallback page when the network fails, keeping the site usable on flaky connections and turning hard failures into graceful ones.

  • Web app manifest Recommended

    A web app manifest is a small JSON file that tells browsers how the site should appear when installed — its name, icons, start URL, theme colour, and display mode.

  • Monitoring and uptime Recommended

    Monitor the site from outside your own infrastructure, combine synthetic checks with real user data, and run a status page on a separate host so it stays up when the site does not.

Internationalisation

Language, locale, direction, and translated content.

  • hreflang tells search engines which language or regional version of a page to show to which user. It uses BCP 47 codes and must be reciprocal across all alternates.

  • Mark passages, phrases, and inline elements that differ from the document language with a lang attribute. WCAG 3.1.2 requires it so assistive tech can switch pronunciation.

  • Sites that serve Arabic, Hebrew, Persian, or Urdu must set dir="rtl" and use CSS logical properties so layouts mirror correctly without hard-coded left and right.

  • Locale-aware content Recommended

    Dates, numbers, currency, and units should be formatted in the user's locale. Use Intl APIs in the browser and the same locale data server-side so output matches expectations.

  • IDNs let domain names contain non-ASCII characters. They are encoded as Punycode on the wire and rendered as Unicode in the browser, subject to anti-spoofing rules.

Search
esc close navigate open