Declaring acceptable languages

Accept Language

Preliminaries

This note describes how the server handles the browser's accept-language request header, enabling search engines to match the visitor's preferred language to the website's natural written language.

When the HTTP protocol was defined, the concept of content negotiation was on the minds of the creators. The specification of a language negotiation protocol followed a pattern similar to how content-type, content-encoding, and authorization headers are negotiated.

Theoretically, language negotiation was to be carried out between the browser and server such that the browser would request a document, and inform the server that it is willing to accept, say French, and the server would respond with a French version of it if one existed, but would reply with a status code of 406 if it didn't.

In practice, this was never successfully employed. Even the world's undisputed champion of multi-language documents, Wikipedia, does not use this on its 6 million articles, other than its landing-page. Instead, websites that have multiple natural language versions of their documents serve them as discrete documents, at fixed URLs, typically located under subdomains or subdirectories. A hypothetical document about cheese may be available in English at https://en.example.com/cheese.html and in French at https://fr.example.com/fromage.html.

Browsers typically construct resource requests with an accept-language tag consisting of the '*' wildcard, instructing the server to respond with whatever document is available, regardless of language.

Because of its almost non-existent role in true language negotiation, it is tempting to dismiss this header and always serve the document requested, if it exists, regardless of the user's ability to understand it. Nevertheless, there is still a role for this request header.

In particular, search engine crawlers can use this header to request documents for a particular language, and to ignore all others. Say for example that Yandex visits a site requesting only documents in the Russian language. When an English website is properly configured, it will return status code 406 — without the body of the document — saving network bandwidth and time for both user-agent and server.

How it works

When a request's path matches a configured path-pattern, content negotiation is initiated. The configured language tag is compared to the request's accept-language header, which may contain more than one acceptable language, or the special '*' wildcard.

If the comparison between the configured match and any of the request header's languages succeeds, then the request is processed with status code 200; the associated language tag is added to the content-language response header; and the document itself is sent in the response payload.

On the other hand, if none of the request header's languages matches the configured match, the response returns status code 406 with an rw-language-not-acceptable information header.

When all configured path-patterns have been searched, and none match the request, the information header rw-language-not-configured is added to the response and a status code 406 is returned.

Configuration

The accept-language configuration section is used to declare which natural languages the server is able to serve. It comprises a collection of two-part entries: the left hand side is a path-pattern, and the right-hand side is a language tag that adheres to the IETF RFC 5646 specifications.

Refer to the separate note regarding Path Pattern rules.

Language negotiation is not attempted and the accept-language request header is completely ignored if the modules section does not explicitly enable the accept-language module.

Placement

The accept-language configuration sub-section may appear in a request section, subordinate to either the server section or a host section. Entries that occur in the host section will completely override entries in the server section; they are not merged.

EBNF

SP ::= U+20
CR ::= U+0D
ASTERISK ::= U+2A
QUESTION-MARK ::= U+3F
SOLIDUS ::= U+2F
EQUALS-SIGN ::= U+3D
GRAVE-ACCENT ::= U+60
LEFT-CURLY-BRACKET ::= U+7B
RIGHT-CURLY-BRACKET ::= U+7D
file-system-chars ::= (ALPHA | DIGIT | )*
wildcards ::= ASTERISK | QUESTION-MARK
path-pattern ::= (SOLIDUS | file-system-chars | wildcards)*
delimited-path-pattern ::= GRAVE-ACCENT path-pattern GRAVE-ACCENT
rfc5646tag ::= ALPHA* ††
language-attribute ::= ASTERISK 'lang' EQUALS-SIGN rfc5646tag
accept-language-entry ::= delimited-path-pattern SP language-attribute CR
accept-language-section ::= 'accept-language' SP LEFT-CURLY-BRACKET CR
accept-language-entry*
RIGHT-CURLY-BRACKET CR

† Legal file system characters vary by platform

†† See RFC 5646 for language tag rules

Cookbook

Example 1: All documents are English
server {    
modules {
accept-language on
}
request {
accept-language {
`*` *lang=en
}
}
}
Example 2: Subdirectories used to organize by language
server {    
modules {
accept-language on
}
request {
accept-language {
`/ja/*` *lang=ja
`/zh-hans/*` *lang=zh-Hans
`/zh-hant/*` *lang=zh-Hant
`*` *lang=en
}
}
}

Review

Key points to remember:

  • The accept-language module should be enabled for most production websites.
  • The accept-language and content-language headers serve a useful purpose for search engine crawlers.

Declaring acceptable languages