Declaring acceptable languages

Accept Language

Preliminaries

This note describes how the server handles the browser's accept-language request header, enabling search engines to match the visitor's preferred language to the website's natural written language.

When the HTTP protocol was defined, the concept of content negotiation was on the minds of the creators. The specification of a language negotiation protocol followed a pattern similar to how content-type, content-encoding, and authorization headers are negotiated.

Theoretically, language negotiation was to be carried out between the browser and server such that the browser would request a document, and inform the server that it is willing to accept, say French, and the server would respond with a French version of it if one existed, but would reply with a status code of 406 if it didn't.

In practice, this was never successfully employed. Even the world's undisputed champion of multi-language documents, Wikipedia, does not use this on its 6 million articles, other than its landing-page. Instead, websites that have multiple natural language versions of their documents serve them as discrete documents, at fixed URLs, typically located under subdomains or subdirectories. A hypothetical document about cheese may be available in English at https://en.example.com/cheese.html and in French at https://fr.example.com/fromage.html.

Browsers typically construct resource requests with an accept-language tag consisting of the '*' wildcard, instructing the server to respond with whatever document is available, regardless of language.

Because of its almost non-existent role in true language negotiation, it is tempting to dismiss this header and always serve the document requested, if it exists, regardless of the user's ability to understand it. Nevertheless, there is still a role for this request header.

In particular, search engine crawlers can use this header to request documents for a particular language, and to ignore all others. Say for example that Yandex visits a site requesting only documents in the Russian language. When an English website is properly configured, it will return status code 406 — without the body of the document — saving network bandwidth and time for both user-agent and server.

How it works

When a request's path matches a configured path-pattern, content negotiation is initiated. The configured language tag is compared to the request's accept-language header, which may contain more than one acceptable language, or the special '*' wildcard.

If the comparison between the configured match and any of the request header's languages succeeds, then the request is processed with status code 200; the associated language tag is added to the content-language response header; and the document itself is sent in the response payload.

On the other hand, if none of the request header's languages matches the configured match, the response returns status code 406 with an rw-language-not-acceptable information header.

When all configured path-patterns have been searched, and none match the request, the information header rw-language-not-configured is added to the response and a status code 406 is returned.

Configuration

The accept-language configuration section is used to declare which natural languages the server is able to serve. It comprises a collection of two-part entries: the left hand side is a path-pattern, and the right-hand side is a language tag that adheres to the IETF RFC 5646 specifications.

Refer to the separate note regarding Path Pattern rules.

Language negotiation is not attempted and the accept-language request header is completely ignored if the modules section does not explicitly enable the accept-language module.

Placement

The accept-language configuration sub-section may appear in a request section, subordinate to either the server section or a host section. Entries that occur in the host section will completely override entries in the server section; they are not merged.

EBNF

SP	::=	U+20
CR	::=	U+0D
ASTERISK	::=	U+2A
QUESTION-MARK	::=	U+3F
SOLIDUS	::=	U+2F
EQUALS-SIGN	::=	U+3D
GRAVE-ACCENT	::=	U+60
LEFT-CURLY-BRACKET	::=	U+7B
RIGHT-CURLY-BRACKET	::=	U+7D
file-system-chars	::=	(ALPHA \| DIGIT \| ^†)*
wildcards	::=	ASTERISK \| QUESTION-MARK
path-pattern	::=	(SOLIDUS \| file-system-chars \| wildcards)*
delimited-path-pattern	::=	GRAVE-ACCENT path-pattern GRAVE-ACCENT
rfc5646tag	::=	ALPHA* ^††
language-attribute	::=	ASTERISK 'lang' EQUALS-SIGN rfc5646tag
accept-language-entry	::=	delimited-path-pattern SP language-attribute CR
accept-language-section	::=	'accept-language' SP LEFT-CURLY-BRACKET CR accept-language-entry* RIGHT-CURLY-BRACKET CR

† Legal file system characters vary by platform

†† See RFC 5646 for language tag rules

Cookbook

Example 1: All documents are English

server {    
    modules {
        accept-language on
    }
    request {
        accept-language {
            `*`   *lang=en
        }    
    }
}

Example 2: Subdirectories used to organize by language

server {    
    modules {
        accept-language on
    }
    request {
            accept-language {
            `/ja/*`       *lang=ja
            `/zh-hans/*`  *lang=zh-Hans 
            `/zh-hant/*`  *lang=zh-Hant
            `*`           *lang=en
        }    
    }
}

Review

Key points to remember:

The accept-language module should be enabled for most production websites.
The accept-language and content-language headers serve a useful purpose for search engine crawlers.

This note describes how browsers and servers proclaim their ability to handle different file formats, allowing the browser to request and receive only files that it is able to understand.

These notes provide information on how to configure the server to properly inform user-agents of the kinds of content the server is able to provide.

RWSERVE, content negotiation, language, types, allow, cache, charsets, encoding, negotiation, etag, MIME, range

Smart tech

READ WRITE TOOLS

Templates & Content

READ WRITE HUB

Rediscover HTML

BLUE PHRASE

Declaring acceptable languages

Accept Language

How it works

Configuration

Placement

EBNF

Cookbook

Path Patterns

Specification of path-pattern globbing rules

This note documents the rules used by the server for all configuration entries that use the path-pattern idiom.

path-patterns, globstar, asterisk, question mark, wildcard, GRAVE-ACCENT delimiters

Content Negotiation

Balancing what's acceptable with what's possible

This note describes how browsers and servers proclaim their ability to handle different file formats, allowing the browser to request and receive only files that it is able to understand.

RWSERVE, content negotiation, accept-type, content-type, MIME-types, rw-no-acceptable-type, IETF RFC 6838

Headers

In-depth configuration rules for HTTP request/response cycle

These notes provide information on how to configure the server to properly inform user-agents of the kinds of content the server is able to provide.

RWSERVE, content negotiation, language, types, allow, cache, charsets, encoding, negotiation, etag, MIME, range