Declaring acceptable languages
This note describes how the server handles the browser's accept-language request header, enabling search engines to match the visitor's preferred language to the website's natural written language.
When the HTTP protocol was defined, the concept of content negotiation was on the minds of the creators. The specification of a language negotiation protocol followed a pattern similar to how content-type, content-encoding, and authorization headers are negotiated.
Theoretically, language negotiation was to be carried out between the browser and server such that the browser would request a document, and inform the server that it is willing to accept, say French, and the server would respond with a French version of it if one existed, but would reply with a status code of
406 if it didn't.
In practice, this was never successfully employed. Even the world's undisputed champion of multi-language documents, Wikipedia, does not use this on its 6 million articles, other than its landing-page. Instead, websites that have multiple natural language versions of their documents serve them as discrete documents, at fixed URLs, typically located under subdomains or subdirectories. A hypothetical document about cheese may be available in English at
https://en.example.com/cheese.html and in French at
Browsers typically construct resource requests with an accept-language tag consisting of the
'*' wildcard, instructing the server to respond with whatever document is available, regardless of language.
Because of its almost non-existent role in true language negotiation, it is tempting to dismiss this header and always serve the document requested, if it exists, regardless of the user's ability to understand it. Nevertheless, there is still a role for this request header.
In particular, search engine crawlers can use this header to request documents for a particular language, and to ignore all others. Say for example that Yandex visits a site requesting only documents in the Russian language. When an English website is properly configured, it will return status code
406 — without the body of the document — saving network bandwidth and time for both user-agent and server.
How it works
When a request's path matches a configured path-pattern, content negotiation is initiated. The configured language tag is compared to the request's
accept-language header, which may contain more than one acceptable language, or the special
If the comparison between the configured match and any of the request header's languages succeeds, then the request is processed with status code
200; the associated language tag is added to the
content-language response header; and the document itself is sent in the response payload.
On the other hand, if none of the request header's languages matches the configured match, the response returns status code
406 with an
rw-language-not-acceptable information header.
When all configured path-patterns have been searched, and none match the request, the information header
rw-language-not-configured is added to the response and a status code
406 is returned.
accept-language configuration section is used to declare which natural languages the server is able to serve. It comprises a collection of two-part entries: the left hand side is a path-pattern, and the right-hand side is a language tag that adheres to the IETF RFC 5646 specifications.
Refer to the separate note regarding Path Pattern rules.
Language negotiation is not attempted and the
accept-language request header is completely ignored if the
modules section does not explicitly enable the
accept-language configuration sub-section may appear in a
request section, subordinate to either the
server section or a
host section. Entries that occur in the
host section will completely override entries in the
server section; they are not merged.
|file-system-chars||::=||(ALPHA | DIGIT | †)*|
|wildcards||::=||ASTERISK | QUESTION-MARK|
|path-pattern||::=||(SOLIDUS | file-system-chars | wildcards)*|
|delimited-path-pattern||::=||GRAVE-ACCENT path-pattern GRAVE-ACCENT|
|language-attribute||::=||ASTERISK 'lang' EQUALS-SIGN rfc5646tag|
|accept-language-entry||::=||delimited-path-pattern SP language-attribute CR|
|accept-language-section||::=||'accept-language' SP LEFT-CURLY-BRACKET CR|
† Legal file system characters vary by platform
†† See RFC 5646 for language tag rules
Example 1: All documents are English
Example 2: Subdirectories used to organize by language
Key points to remember:
accept-languagemodule should be enabled for most production websites.
content-languageheaders serve a useful purpose for search engine crawlers.