Using MIME types to control the behavior associated with filename extensions
This note provides information about MIME-type configuration: associating filename extensions; declaring character sets; declaring encoding algorithms; and declaring content negotiation.
The content of files varies greatly, and all parties need to be able to communicate precisely which rules to follow when reading and writing files. For a long time, software used standard naming conventions — based primarily on a filename's trailing extension — to communicate what rules to use to read and interpret a file's contents. This proved to be ambiguous and untenable, and the software industry has moved to the use of content-types to convey this information.
Content-types are also known as MIME types. They are described in IETF RFC 6838 Media Type Specifications and Registration Procedures. Each content type is composed of a media type and a subtype. Media types include:
multipart. Subtypes include values like:
mp4 and so forth.
Mapping filename extensions to content-types is a required part of server configuration. When configuring the server, the
content-types section is used to associate filename extensions with
content-type HTTP headers.
In addition to filename extension mapping, the server uses MIME-types to configure three other rules: character sets, compression algorithms, and content negotiation.
Text media use various character sets to represent glyphs (the letters of the alphabet, numbers, punctuation, etc.). Communicating that information to all parties is essential for a correct reading of a text document, and is done using the
charset attribute of the
content-type header. For more about character set identifiers refer to IETF RFC 2978 IANA Charset Registration Procedures.
When a file's contents are compressed during transmission, the sender must know how to compress the outgoing bytes, and the receiver must know how to decompress the incoming bytes. The server is configured to handle this by associating content-encodings with MIME-types.
When requesting a file with a
GET method, the server negotiates with the browser to determine which MIME-types are acceptable. The
accept-types section is used to configure that; see the separate note about how that occurs. When uploading a file with a
PUT method, the server can selectively decide what types of files are acceptable. For example, if a server is ready to handle TIFF images, it would signal that with an
image/tiff entry in the
content-types configuration section is used to associate filename extensions to content-types. It comprises a collection of two-part entries: the left hand side is the filename extension (without a leading dot), and the right-hand side is the MIME-type. Filename extensions are case-sensitive. When a requested file is found on the server, but no
content-type is associated with its extension, no
content-type header is sent with the response, per HTTP official guidelines.
charset configuration section is used to declare character set identifiers for
application media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the character set identifier. When a requested file is served with a
content-type header, any declared charset identifier is appended to that header. When configured in this fashion, there is no need to add an HTML
meta tag or CSS
@charset declaration to each file's inner contents. On the other hand, when no charset identifier is configured for a given MIME-type, the response headers omit the
charset attribute completely.
content-encoding configuration section is used to define which compression algorithm to use based on media types. It comprises a collection of two-part entries: the left hand side is the MIME-type, and the right-hand side is the compression algorithm.
accept-types configuration section is used to declare which types can be negotiated. It comprises a collection of two-part entries: the left hand side is the literal string
mime-type, and the right-hand side is a MIME-type that adheres to the IETF RFC 6838 specifications.
accept-types configuration sections may appear in either the
server section or a
host section. When values occur in both the
host sections, they are merged according to the standard rules defined for the
|media-type||::=||'text' | 'application' | 'image' | 'audio' | 'video' | 'multipart'|
|subtype||::=||(ALPHA | DIGIT | †)*|
|MIME-type||::=||media-type SOLIDUS subtype|
|filename-extension||::=||(ALPHA | DIGIT | ††)*|
|content-type-entry||::=||filename-extension SP MIME-type CR|
|content-types-section||::=||'content-types' SP LEFT-CURLY-BRACKET CR|
|charset-identifier||::=||(ALPHA | DIGIT | †††)*|
|charset-entry||::=||MIME-type SP charset-identifier CR|
|charset-section||::=||'charset' SP LEFT-CURLY-BRACKET CR|
|compression-algorithm||::=||'gzip' | 'deflate' | 'none'|
|content-encoding-entry||::=||MIME-type SP compression-algorithm CR|
|content-encoding-section||::=||'content-encoding' SP LEFT-CURLY-BRACKET CR|
|accept-type-entry||::=||'mime-type' SP* MIME-type CR|
|accept-types-section||::=||'accept-types' SP LEFT-CURLY-BRACKET CR|
† See section 4.2 of RFC 6838 for exact rules
†† Legal file system characters vary by platform
††† See IETF RFC 2978 for guidance
Example 1: Filename extensions associated with MIME-types
Example 2: Character set identifiers associated with MIME-types
Example 3: content-encoding algorithms associated with MIME-types
Example 4: MIME-types declared for browser accept-types negotiation
Key points to remember:
content-typessection associates filename extensions to MIME-types.
charsetsection associates MIME-types with character set identifiers.
content-encodingsection associates MIME-types with compression algorithms.
accept-typessection lists MIME-types that may be served by negotiated settlement with the browser.