Website Spec
Internationalisation Optional Updated 2026-05-29

Internationalised Domain Names (IDN)

IDNs let domain names contain non-ASCII characters. They are encoded as Punycode on the wire and rendered as Unicode in the browser, subject to anti-spoofing rules.

What it is

Internationalised Domain Names allow labels in a hostname to contain characters outside ASCII — Cyrillic, Greek, Han, Arabic, accented Latin, and so on. münchen.de, 日本語.jp, and παράδειγμα.gr are all valid.

DNS itself only carries ASCII, so each non-ASCII label is encoded with Punycode and prefixed with xn--. The browser shows the Unicode form to the user; the resolver, certificate, and Host header see the ASCII form.

User sees:    münchen.de
On the wire:  xn--mnchen-3ya.de

The current standard is IDNA 2008 (RFC 5890–5894), with Unicode TR #46 defining the compatibility processing browsers actually use.

Why it matters

For users outside the ASCII-Latin world, a domain in their own script is more memorable, more brandable, and easier to type than a transliteration. For everyone, the security implications matter: visually similar characters across scripts (“paypal” written with a Cyrillic “а” instead of Latin “a”) enable homograph attacks, where a malicious domain impersonates a real one. Browsers therefore apply display rules that decide when to show the Unicode form and when to fall back to Punycode in the address bar.

How to implement

Most sites only consume IDNs; few operate them. Either way, get the basics right.

If you own an IDN:

  • Register both the IDN and an ASCII fallback, and 301 between them so links and analytics consolidate.
  • Get a TLS certificate that covers both forms. Modern CAs handle this automatically when you submit the A-label (xn--…); verify the certificate’s dNSName SAN entries include the encoded form.
  • Set <link rel="canonical"> to one form (typically the Unicode form) so search engines pick a single representation.
  • Test mail flow. Local-part Internationalised Email (RFC 6531) is a separate spec and patchily supported.

If your site accepts hostnames as input (email, URLs, webhooks):

  • Normalise with IDNA 2008 + UTS #46 before storing. Use a library (idna in Python, url.domainToASCII in Node, IDN in the JDK) rather than hand-rolling the algorithm.
  • Store the A-label (xn--…) as the canonical key and display the U-label (Unicode) to users.
  • Apply your platform’s spoofing protection: mixed-script detection, confusable-character checks, and the Unicode confusables data.

Browser display rules. Chromium, Firefox, and Safari each maintain a policy: show Unicode if the label is in a single script (with a few mixed exceptions), or if the user’s configured languages include that script. Otherwise show Punycode. You cannot override this from a website, and you should not try — it is the user’s protection against homograph attacks. Test your IDN in each browser and accept that some users will see xn--mnchen-3ya.de.

Common mistakes

  • Comparing hostnames as raw Unicode strings, missing case folding, NFC normalisation, and IDNA mapping. Always normalise to the A-label first.
  • Accepting an IDN at registration but not on login because two libraries normalise differently.
  • Assuming the address bar always shows Unicode. Browser policy may downgrade to Punycode, which is fine.
  • Issuing a certificate only for the U-label. Certificates must cover the A-label encoding.
  • Treating IDN as a green-light for arbitrary Unicode. IDNA 2008 disallows many code points (symbols, emoji, joiners) for safety.

Related topics

Sources & further reading

Search
esc close navigate open