Declaring which character set to use with documents, style sheets, and text files
This note describes how character set declarations can be added to response headers for documents, style sheets, and text files.
Most new text documents served on the web today are prepared using the Unicode character set with UTF-8 encoding. But not all. The transition from ISO/IEC 8859-x character sets to UTF-8 has been underway for years, and will continue indefinitely. Likewise for Windows-125x character sets. And while older DOS-based code-page encodings (CP437 and its cousins) are rarely seen any more, the Shift_JIS, BIG5, EUC-KR character encodings for Japanese, Chinese, and Korean are still in active use.
It is a common practice to add a meta tag like this to an HTML's
or a rule like this at the top of a CSS file:
or an attribute added to a
or an explicit declaration added to an XML document:
<?xml version ="1.0" encoding="UTF-8"?>
These are necessary only when a server fails to properly declare, in its response headers, which character set is used for the content.
When properly configured, these statements can be removed, because the server communicates directly to the browser, via response headers, which character set encoding is applicable.
Character set identifiers should always be used exactly as specified, with proper capitalization and hyphenation. For example,
Shift_JIS are all properly specified. Lower case variants should not be used.
For more about character set identifiers refer to IETF RFC 2978 IANA Charset Registration Procedures.
charset configuration section is used to declare character set identifiers for all documents of a given MIME-type. It comprises a collection of two-part entries: the left hand side is a MIME-type, and the right-hand side is the character set identifier.
If the website has some documents encoded in one character set and other documents (of the same MIME-type) encoded in a different character set the
charset configuration section should not be used, and the common practice outlined above should be followed.
When a requested file is served, the declared charset identifier is appended to the
content-type response header. For example, an HTML file encoded in UTF-8 would have a header of
Charsets should only be configured for
application media types; they are not meaningful for
Note that each content type has a different set of rules regarding which charsets may be used. For example, JSON can only be UTF-8, UTF-16, or UTF-32.
charset configuration section may appear in either the
server/response subsection or a
host/response subsection. When values occur in both the
host/response subsections, they are merged according to the standard rules defined for the
|media-type||::=||'text' | 'application'|
|subtype||::=||(ALPHA | DIGIT | †)*|
|MIME-type||::=||media-type SOLIDUS subtype|
|charset-identifier||::=||(ALPHA | DIGIT | ††)*|
|charset-entry||::=||MIME-type SP charset-identifier CR|
|charset-section||::=||'charset' SP LEFT-CURLY-BRACKET CR|
† See section 4.2 of RFC 6838 for exact rules
†† See IETF RFC 2978 for guidance
Example 1: Using the UTF-8 charset throughout
Example 2: Using the ISO-8859-1 charset for HTML documents
Key points to remember:
charsetconfiguration section associates MIME-types with character set identifiers.
- A charset attribute is appended to the
- No charset declaration is sent, even if properly configured, if no
content-typeresponse header is generated.